Testing Frameworks

Overview

The Testing Frameworks document describes the specific testing frameworks, libraries, and tools used throughout the Engineering AI Agent development lifecycle. This document provides guidance on framework selection, configuration, best practices, and integration with CI/CD pipelines.

Unit Testing Frameworks

Python Testing

The system uses pytest as the primary unit testing framework for Python services:

# Example of a pytest fixture
@pytest.fixture
def knowledge_base_client():
    """Create a mock knowledge base client for testing"""
    client = mock.Mock(spec=KnowledgeBaseClient)
    client.search.return_value = ["Mock result 1", "Mock result 2"]
    return client

def test_knowledge_retrieval(knowledge_base_client):
    """Test that knowledge retrieval works correctly"""
    retriever = KnowledgeRetriever(client=knowledge_base_client)
    results = retriever.get_relevant_knowledge("test query")
    
    assert len(results) == 2
    knowledge_base_client.search.assert_called_once_with("test query")

JavaScript Testing

For frontend components and Node.js services, Jest is used:

// Example of a Jest test
describe('UserPreferences', () => {
  test('should load user preferences correctly', async () => {
    const mockResponse = { theme: 'dark', notifications: true };
    fetch.mockResponseOnce(JSON.stringify(mockResponse));
    
    const preferences = await UserPreferences.load('user123');
    
    expect(preferences.theme).toBe('dark');
    expect(preferences.notifications).toBe(true);
  });
});

API Testing

RESTful API Testing

FastAPI endpoints are tested using pytest with the TestClient:

from fastapi.testclient import TestClient
from app.main import app

client = TestClient(app)

def test_get_user_profile():
    """Test retrieving a user profile"""
    response = client.get("/api/v1/users/profile/123", headers={"Authorization": "Bearer test_token"})
    assert response.status_code == 200
    assert response.json()["id"] == "123"

GraphQL API Testing

GraphQL endpoints are tested with pytest and the gql client:

def test_execute_graphql_query(gql_client):
    """Test executing a GraphQL query"""
    query = """
        query GetUser($userId: ID!) {
            user(id: $userId) {
                id
                name
                email
            }
        }
    """
    variables = {"userId": "123"}
    result = gql_client.execute(query, variable_values=variables)
    
    assert "errors" not in result
    assert result["data"]["user"]["id"] == "123"

Integration Testing

Service Integration Testing

Service integration is tested using Docker Compose for local service orchestration:

# docker-compose-test.yml snippet
version: '3.8'
services:
  api-gateway:
    image: eng-ai-agent/api-gateway:test
    ports:
      - "8000:8000"
    environment:
      - ROLE_SERVICE_URL=http://role-service:8001
      - TASK_MANAGER_URL=http://task-manager:8002
    depends_on:
      - role-service
      - task-manager

  role-service:
    image: eng-ai-agent/role-service:test
    ports:
      - "8001:8001"
    environment:
      - DB_CONNECTION_STRING=postgresql://test_user:test_pass@db:5432/test_db

  task-manager:
    image: eng-ai-agent/task-manager:test
    ports:
      - "8002:8002"
    environment:
      - DB_CONNECTION_STRING=postgresql://test_user:test_pass@db:5432/test_db

Python integration tests leverage the pytest-docker-compose plugin.

AI/ML Testing Frameworks

Model Evaluation

Custom evaluation frameworks measure AI model performance:

def evaluate_model_response(model, prompt, expected_patterns, unwanted_patterns):
    """Evaluate model response against expected patterns"""
    response = model.generate(prompt)
    
    # Check for expected patterns
    pattern_matches = 0
    for pattern in expected_patterns:
        if re.search(pattern, response):
            pattern_matches += 1
    
    # Check for unwanted patterns
    unwanted_matches = 0
    for pattern in unwanted_patterns:
        if re.search(pattern, response):
            unwanted_matches += 1
    
    score = pattern_matches / len(expected_patterns) * (1 - unwanted_matches / len(unwanted_patterns))
    return score, response

Prompt Testing

A specialized framework tests prompt effectiveness across different scenarios.

E2E Testing Framework

Robot Framework handles end-to-end testing with a behavior-driven approach:

*** Settings ***
Documentation     E2E tests for the code review workflow
Library           SeleniumLibrary
Test Setup        Open Browser To Login Page
Test Teardown     Close Browser

*** Variables ***
${LOGIN URL}      https://app.engineering-ai-agent.com/login
${BROWSER}        chrome

*** Test Cases ***
Valid Code Review Request
    Login As Valid User    dev@example.com    password123
    Navigate To Code Review Page
    Submit Code Review Request    https://github.com/org/repo/pull/123
    Wait Until Element Is Visible    id:review-status
    Element Should Contain    id:review-status    In Progress

Performance Testing

Locust is used for performance and load testing:

from locust import HttpUser, task, between

class EngineeringAIUser(HttpUser):
    wait_time = between(1, 5)
    
    def on_start(self):
        # Login
        self.client.post("/login", json={
            "username": "test_user",
            "password": "test_password"
        })
    
    @task(3)
    def view_dashboard(self):
        self.client.get("/dashboard")
    
    @task(1)
    def create_task(self):
        self.client.post("/tasks", json={
            "title": "Test task",
            "description": "Created during load test",
            "priority": "medium"
        })

Test Data Management

Fixtures

Test data fixtures are stored in JSON/YAML format and versioned with the codebase:

# fixtures/test_repositories.yaml
- id: repo1
  name: engineering-ai-agent
  owner: org
  url: https://github.com/org/engineering-ai-agent
  primary_branch: main
  languages: 
    - python
    - typescript
  permissions:
    admin: true
    push: true
    pull: true

Data Factories

Factory pattern generates test data programmatically:

class UserFactory:
    """Factory for creating test users"""
    
    @staticmethod
    def create(role="developer", custom_attributes=None):
        """Create a test user with the given role"""
        user_id = f"user_{uuid.uuid4().hex[:8]}"
        attributes = {
            "id": user_id,
            "name": f"Test {role.capitalize()}",
            "email": f"{user_id}@example.com",
            "role": role,
            "created_at": datetime.utcnow().isoformat(),
            "preferences": {
                "theme": "light",
                "notifications": True
            }
        }
        
        if custom_attributes:
            attributes.update(custom_attributes)
            
        return attributes

CI/CD Integration

Test execution is integrated into the CI/CD pipeline using GitHub Actions:

# .github/workflows/test.yml snippet
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: '3.10'
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements-dev.txt
      - name: Run unit tests
        run: pytest tests/unit --cov=app --cov-report=xml
      - name: Run integration tests
        run: pytest tests/integration
      - name: Upload coverage
        uses: codecov/codecov-action@v2

Detailed Design and Specifications

This section will cover detailed specifications for test framework configurations, custom test harnesses, reporting integrations, and implementation guidelines for various test types.

Overview​

Unit Testing Frameworks​

Python Testing​

JavaScript Testing​

API Testing​

RESTful API Testing​

GraphQL API Testing​

Integration Testing​

Service Integration Testing​

AI/ML Testing Frameworks​

Model Evaluation​

Prompt Testing​

E2E Testing Framework​

Performance Testing​

Test Data Management​

Fixtures​

Data Factories​

CI/CD Integration​

Detailed Design and Specifications​