CONFIDENTIALITY NOTICE
This document contains confidential and proprietary information intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, please notify the sender and delete this document. Unauthorized use is prohibited.
Time to Complete
within 24 hours. (Note: We highly suggest you to use an AI coding tool to finish this homework, e.g. cursor)
Design an AI chatbot leveraging uploaded documents to implement Retrieval-Augmented Generation (RAG).
1. Requirements
Functional Requirements
- Document Management
- Upload multiple document formats (PDF, DOC, TXT)
- Process and index documents
asynchronously
- View document processing status
- Delete/update documents
- Regarding parsing library, please use our AnyParser https://github.com/CambioML/any-parser
- Chat Interface
- Start new chat sessions
- Reference specific documents in conversations
- View source documents for answers
- Save chat history
- Chat history persistence
Non-Functional Requirements
- Programming Language:
- Use Python as the primary language for development.
- System Design:
- Employ a microservices-based architecture to ensure modularity and separation of concerns.
- The solution should be capable of running on a single machine during development for simplicity.
- Ensure the design can scale horizontally across multiple machines to handle increased load.
- Deployability:
- The solution should be designed with eventual deployment into a Kubernetes (K8s) cluster in mind.
- Use containerization (e.g., Docker) to package each microservice for consistent deployment.
- Scalability:
- Ensure the system can handle increased document uploads, processing, and user interactions by scaling individual components independently.
- Maintainability:
- Adopt clear APIs and modular design to allow for easy updates and future feature integration.
- Implement logging and monitoring to simplify debugging and operational oversight.
- English
- Make sure all documentation, code comment, git commit and PR are in English.
2. System Architecture (For reference)