Intelligent RAG assistant split across 3 independent microservices, orchestrated with Kubernetes, secured with RBAC and HashiCorp Vault, and monitored via the Prometheus/Loki/Grafana stack.
Development of an intelligent Retrieval-Augmented Generation (RAG) assistant split into 3 independent microservices for independent scalability. Deployed on a cloud environment with strict security policies and full observability.
This academic project explores modern cloud-native architecture patterns: microservices decomposition, container orchestration, infrastructure-as-code, and production-grade security. The RAG assistant answers user queries by retrieving relevant context from a knowledge base before generating responses.
The application is divided into 3 independent services, each deployable and scalable separately:
| Tool | Role | |------|------| | Prometheus | Metrics scraping (latency, throughput, error rate) | | Loki | Centralized log aggregation from all pods | | Grafana | Unified dashboards for metrics and logs |
Custom dashboards track AI generation latency P50/P95/P99 and retrieval hit rates.