Deployment Workflow
This document describes the deployment workflow in the Engineering AI Agent system, focusing on how code changes move from development to production environments.
Overview
The deployment workflow encompasses several stages:
- Environment preparation
- Build process
- Testing in staging
- Production deployment
- Monitoring and rollback procedures
Workflow Stages
1. Environment Preparation
Primary Roles: RD (Research & Development), SRE (Site Reliability Engineering)
Before deployment, the environment is prepared:
- Configuration files are updated for the target environment
- Environment variables are set appropriately
- Infrastructure is provisioned if needed
- Dependencies are updated to required versions
Example environment configuration:
# config/environment/staging.yaml
environment: staging
logging:
level: info
format: json
database:
host: ${DB_HOST}
port: ${DB_PORT}
name: engineering_agent_staging
pool_size: 10
services:
slack:
enabled: true
github:
enabled: true
jira:
enabled: true
clickup:
enabled: false
2. Build Process
Primary Role: CI/CD Pipeline
The build process prepares the application for deployment:
- Source code is fetched from the repository
- Dependencies are installed
- Code is compiled or bundled if necessary
- Tests are run to verify functionality
- Artifacts are packaged for deployment
Example build workflow:
name: Build
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements-dev.txt
- name: Run tests
run: pytest
- name: Build package
run: |
pip install build
python -m build
- name: Upload artifact
uses: actions/upload-artifact@v3
with:
name: dist
path: dist/
3. Testing in Staging
Primary Roles: QA (Quality Assurance), SRE (Site Reliability Engineering)
Before production deployment, changes are tested in a staging environment:
- Built artifacts are deployed to the staging environment
- Integration with external services is verified
- Performance and security testing is performed
- User acceptance testing may be conducted
- Smoke tests verify basic functionality
Example staging deployment workflow:
name: Deploy to Staging
on:
workflow_run:
workflows: ["Build"]
branches: [develop]
types: [completed]
jobs:
deploy-staging:
runs-on: ubuntu-latest
if: ${{ github.event.workflow_run.conclusion == 'success' }}
steps:
- name: Download artifact
uses: actions/download-artifact@v3
with:
name: dist
path: dist/
- name: Deploy to staging
env:
DEPLOY_TOKEN: ${{ secrets.STAGING_DEPLOY_TOKEN }}
run: |
./scripts/deploy.sh --env staging --token $DEPLOY_TOKEN
- name: Run smoke tests
run: |
./scripts/smoke_tests.sh --env staging
4. Production Deployment
Primary Roles: SRE (Site Reliability Engineering), RD (Research & Development)
Once verified in staging, changes are deployed to production:
- Deployment may follow different strategies (blue/green, canary, etc.)
- Database migrations are applied if necessary
- Configuration is updated for production
- Traffic is gradually shifted to the new version
- Deployment is monitored for issues
Example production deployment workflow:
name: Deploy to Production
on:
workflow_dispatch:
inputs:
version:
description: 'Version to deploy'
required: true
strategy:
description: 'Deployment strategy'
required: true
default: 'blue-green'
type: choice
options:
- blue-green
- canary
- rolling
jobs:
deploy-production:
runs-on: ubuntu-latest
steps:
- name: Checkout deploy scripts
uses: actions/checkout@v3
with:
sparse-checkout: scripts/
- name: Deploy to production
env:
DEPLOY_TOKEN: ${{ secrets.PROD_DEPLOY_TOKEN }}
VERSION: ${{ github.event.inputs.version }}
STRATEGY: ${{ github.event.inputs.strategy }}
run: |
./scripts/deploy.sh --env production --token $DEPLOY_TOKEN --version $VERSION --strategy $STRATEGY
- name: Verify deployment
run: |
./scripts/verify_deployment.sh --env production
5. Monitoring and Rollback
Primary Role: SRE (Site Reliability Engineering)
After deployment, the system is monitored:
- Performance metrics are tracked
- Error rates are monitored
- User-impacting issues are identified
- Rollback procedures are executed if necessary
Example monitoring alert configuration:
# prometheus-alerts.yaml
groups:
- name: engineering_agent_alerts
rules:
- alert: HighErrorRate
expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: High error rate detected
description: Error rate exceeded 5% for more than 2 minutes
- alert: SlowResponses
expr: http_request_duration_seconds{quantile="0.9"} > 2
for: 5m
labels:
severity: warning
annotations:
summary: Slow API responses detected
description: 90th percentile response time is above 2 seconds for 5 minutes
Deployment Strategies
Blue-Green Deployment
- Two identical environments are maintained (Blue and Green)
- Changes are deployed to the inactive environment
- Tests verify the new deployment
- Traffic is switched from the active to the inactive environment
- Allows for immediate rollback by switching traffic back
Canary Deployment
- New version is deployed alongside the old version
- Small percentage of traffic is routed to the new version
- Traffic percentage gradually increases if no issues are found
- Allows for early detection of issues with minimal impact
- Rollback requires scaling down the new version
Rolling Deployment
- New version is deployed gradually across instances
- One or more instances are updated at a time
- Healthchecks verify each updated instance
- Process continues until all instances are updated
- Balances deployment speed with risk mitigation
Versioning and Release Management
Semantic Versioning
The system follows semantic versioning (SemVer):
- Format:
MAJOR.MINOR.PATCH
(e.g.,1.2.3
) - MAJOR: Incompatible API changes
- MINOR: New features (backward compatible)
- PATCH: Bug fixes (backward compatible)
Release Branching Strategy
main
branch represents the current production statedevelop
branch is used for ongoing development- Feature branches branch off from and merge back to
develop
- Release branches are created from
develop
- Hotfixes branch directly from
main
and merge to bothmain
anddevelop
Release Process
- Create a release branch (
release/vX.Y.Z
) fromdevelop
- Perform final testing and bug fixes in the release branch
- Merge the release branch to
main
- Tag the release in
main
(vX.Y.Z
) - Merge the release branch back to
develop
Best Practices
Configuration Management
- Environment-specific configuration remains separate from code
- Sensitive values are stored in secure vaults
- Configuration is versioned and auditable
- Changes to configuration follow the same review process as code
Database Changes
- Database migrations are automated
- Migrations are backward compatible when possible
- Large data migrations are performed in batches
- Rollback procedures are defined for database changes
Deployment Safety
- Feature flags control functionality exposure
- Circuit breakers prevent cascading failures
- Rate limiting protects against unexpected load
- Automated rollbacks trigger on error thresholds
Documentation and Communication
- Release notes document all changes
- Stakeholders are notified before significant deployments
- Post-deployment status is communicated to the team
- Deployment schedules respect low-traffic periods
Example Workflow
-
Release Planning:
- Features are prioritized for the next release
- Version number is assigned based on changes
- Release branch is created from
develop
-
Preparation:
- Final testing is performed in the release branch
- Release notes are prepared
- Deployment plan is reviewed
-
Staging Deployment:
- Release is deployed to staging environment
- Integration and acceptance testing is performed
- Issues are fixed directly in the release branch
-
Production Deployment:
- Stakeholders are notified of the upcoming deployment
- Deployment is scheduled during a low-traffic period
- Release is deployed following the chosen strategy
-
Post-Deployment:
- System is monitored for anomalies
- Success or issues are communicated to the team
- Lessons learned are documented for future deployments