Production Deployment¶
This guide covers deploying the Retail AI system to production environments.
📋 Prerequisites¶
Before deploying to production, ensure you have:
- Python 3.12+: Required runtime environment
- Databricks Workspace: With appropriate permissions and resources
- Unity Catalog: Enabled and configured
- Model Endpoints: Access to required LLM and embedding endpoints
Required Databricks Resources¶
- Unity Catalog: Data governance and function management
- Model Serving: For hosting LLM endpoints
- Vector Search: For semantic search capabilities
- Genie: Natural language to SQL conversion
- SQL Warehouse: For query execution
Default Model Endpoints¶
- LLM Endpoint:
databricks-meta-llama-3-3-70b-instruct - Embedding Model:
databricks-gte-large-en
🚀 Deployment Process¶
Step 1: Environment Setup¶
-
Clone the Repository
-
Create Virtual Environment
-
Configure Environment
Step 2: Data Setup¶
Run the data preparation notebooks in order:
# 1. Ingest and transform data
python 01_ingest-and-transform.py
# 2. Provision vector search
python 02_provision-vector-search.py
# 3. Generate evaluation data (optional)
python 03_generate_evaluation_data.py
Step 3: Model Development and Registration¶
This notebook will: - Build the agent graph - Log the model to MLflow - Register the model in the MLflow Model Registry
Step 4: Model Evaluation¶
This provides: - Performance metrics - Quality assessments - Evaluation reports
Step 5: Production Deployment¶
This notebook handles: - Model alias management (Champion) - Endpoint deployment - Permissions configuration
⚙️ Configuration¶
Configuration is managed through model_config.yaml. Key sections include:
Catalog and Database¶
Model Endpoints¶
resources:
endpoints:
llm_endpoint: "databricks-meta-llama-3-3-70b-instruct"
embedding_endpoint: "databricks-gte-large-en"
Vector Search¶
Genie Configuration¶
Application Settings¶
🔧 Production Usage¶
REST API Endpoint¶
Once deployed, the agent can be called via REST API:
from mlflow.deployments import get_deploy_client
client = get_deploy_client("databricks")
response = client.predict(
endpoint="retail_ai_agent",
inputs={
"messages": [
{"role": "user", "content": "Can you recommend a lamp to match my oak side tables?"}
],
"custom_inputs": {
"configurable": {
"thread_id": "1",
"tone": "friendly"
}
}
}
)
Streamlit Application¶
Deploy the store management interface:
📊 Monitoring and Observability¶
MLflow Tracking¶
Enable comprehensive tracking:
Debug Logging¶
Enable debug logging for troubleshooting:
Performance Monitoring¶
Monitor key metrics: - Response Time: End-to-end latency - Accuracy: Response quality scores - Usage: Request volume and patterns - Errors: Error rates and types
🔒 Security Considerations¶
Access Control¶
- Unity Catalog Permissions: Ensure proper data access controls
- Model Serving Permissions: Restrict endpoint access
- API Authentication: Implement proper authentication
- Network Security: Configure VPC and firewall rules
Data Privacy¶
- PII Handling: Implement data anonymization
- Audit Logging: Enable comprehensive audit trails
- Encryption: Ensure data encryption at rest and in transit
- Compliance: Meet regulatory requirements
🚨 Troubleshooting¶
Common Issues¶
- Tool Not Found
- Verify tool registration in agent configuration
-
Check Unity Catalog function permissions
-
Type Errors
- Validate Pydantic model definitions
-
Check field types and constraints
-
Database Errors
- Verify Unity Catalog permissions
-
Check function names and schemas
-
Vector Search Issues
- Verify endpoint status
- Check index configuration and permissions
Debug Steps¶
-
Check Configuration
-
Test Endpoints
-
Verify Permissions
📈 Scaling Considerations¶
Performance Optimization¶
- Endpoint Scaling: Configure auto-scaling for model endpoints
- Caching: Implement response caching for common queries
- Load Balancing: Distribute traffic across multiple endpoints
- Resource Allocation: Optimize compute resources
Cost Optimization¶
- Endpoint Management: Scale down during low usage periods
- Query Optimization: Optimize SQL queries and vector searches
- Resource Monitoring: Track and optimize resource usage
- Cost Alerts: Set up cost monitoring and alerts
🔄 Continuous Deployment¶
CI/CD Pipeline¶
Set up automated deployment:
# .github/workflows/deploy.yml
name: Deploy Retail AI
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Deploy to Databricks
run: |
# Run deployment scripts
python 07_deploy_agent.py
Model Updates¶
For model updates:
- Development: Update model in development environment
- Testing: Run evaluation and quality checks
- Staging: Deploy to staging environment
- Production: Promote to production with alias update
This deployment guide ensures a robust, secure, and scalable production deployment of the Retail AI system.