DevOps & Cloud⭐ Featured

RAG Assistant — Microservices & Cloud Deployment

Intelligent RAG assistant split across 3 independent microservices, orchestrated with Kubernetes, secured with RBAC and HashiCorp Vault, and monitored via the Prometheus/Loki/Grafana stack.

2026

Completed (January 2026)

1 member

Technologies Used

DockerKubernetesGitLab CIPrometheusLokiGrafanaHashiCorp VaultRBACPythonRAG

Development of an intelligent Retrieval-Augmented Generation (RAG) assistant split into 3 independent microservices for independent scalability. Deployed on a cloud environment with strict security policies and full observability.

🎯 Project Overview

This academic project explores modern cloud-native architecture patterns: microservices decomposition, container orchestration, infrastructure-as-code, and production-grade security. The RAG assistant answers user queries by retrieving relevant context from a knowledge base before generating responses.

🏗️ Microservices Architecture

The application is divided into 3 independent services, each deployable and scalable separately:

Retrieval Service — Vector similarity search over the knowledge base
Generation Service — Language model interface for response generation with retrieved context
API Gateway — Single entry point, routing, authentication, and rate limiting

☁️ Containerization & Orchestration

Docker — Each microservice containerized with optimized multi-stage builds
Kubernetes (K8s) — Cluster orchestration with independent HPA per service
Network Policies — Strict inter-service communication rules (least privilege)
RBAC — Role-Based Access Control for Kubernetes resources

🔐 Security

HashiCorp Vault — Dynamic secrets management; no hardcoded credentials
RBAC — Fine-grained access control on Kubernetes namespaces and resources
Network Policies — Pod-to-pod communication restricted to declared flows

🚀 CI/CD Pipeline (GitLab CI)

Docker image build and push on commit
Kubernetes manifest linting
Automated deployment to staging/production namespaces
Rollback triggers on health check failure

📊 Observability Stack (PLG)

| Tool | Role | |------|------| | Prometheus | Metrics scraping (latency, throughput, error rate) | | Loki | Centralized log aggregation from all pods | | Grafana | Unified dashboards for metrics and logs |

Custom dashboards track AI generation latency P50/P95/P99 and retrieval hit rates.