Files
BigDataTool/docs/api-design.md
2025-08-05 23:27:25 +08:00

19 KiB
Raw Blame History

DataTools Pro API接口设计文档

1. API概述

1.1 设计原则

  • RESTful设计: 遵循REST架构风格
  • 统一格式: 标准化的请求和响应格式
  • 版本控制: 支持API版本管理
  • 错误处理: 完整的错误码和错误信息
  • 安全性: 输入验证和权限控制

1.2 基础信息

  • Base URL: http://localhost:5000
  • Content-Type: application/json
  • 字符编码: UTF-8
  • API版本: v1.0

1.3 响应格式规范

{
    "success": true,
    "data": {},
    "message": "操作成功",
    "timestamp": "2024-08-05T10:30:00Z",
    "request_id": "uuid-string"
}

2. 核心API端点

2.1 Cassandra数据比对API

2.1.1 执行单表查询比对

端点: POST /api/query

功能: 执行Cassandra单表数据查询和比对分析

请求参数:

{
    "pro_config": {
        "cluster_name": "production-cluster",
        "datacenter": "datacenter1",
        "hosts": ["10.0.1.100", "10.0.1.101"],
        "port": 9042,
        "username": "cassandra",
        "password": "password",
        "keyspace": "production_ks",
        "table": "user_data"
    },
    "test_config": {
        "cluster_name": "test-cluster",
        "datacenter": "datacenter1", 
        "hosts": ["10.0.2.100"],
        "port": 9042,
        "username": "cassandra",
        "password": "password",
        "keyspace": "test_ks",
        "table": "user_data"
    },
    "keys": ["user_id"],
    "values": ["1001", "1002", "1003"],
    "fields_to_compare": ["name", "email", "status"],
    "exclude_fields": ["created_at", "updated_at"]
}

响应数据:

{
    "success": true,
    "data": {
        "total_keys": 3,
        "pro_count": 3,
        "test_count": 2,
        "differences": [
            {
                "key": {"user_id": "1001"},
                "field": "email",
                "pro_value": "user1@prod.com",
                "test_value": "user1@test.com",
                "message": "字段值不匹配"
            }
        ],
        "identical_results": [
            {
                "key": {"user_id": "1002"},
                "pro_fields": {"name": "User2", "email": "user2@example.com"},
                "test_fields": {"name": "User2", "email": "user2@example.com"}
            }
        ],
        "field_diff_count": {
            "email": 1
        },
        "raw_pro_data": [...],
        "raw_test_data": [...],
        "summary": {
            "overview": "查询了3个Key发现1处差异",
            "percentages": {
                "match_rate": 66.67,
                "diff_rate": 33.33
            },
            "field_analysis": {
                "email": {"diff_count": 1, "diff_rate": 33.33}
            },
            "recommendations": ["建议检查邮箱字段的数据同步"]
        }
    },
    "message": "查询比对完成",
    "execution_time": 1.25,
    "timestamp": "2024-08-05T10:30:00Z"
}

2.1.2 执行分表查询比对

端点: POST /api/sharding-query

功能: 执行Cassandra分表数据查询和比对分析

请求参数:

{
    "pro_config": { /* 同单表查询配置 */ },
    "test_config": { /* 同单表查询配置 */ },
    "keys": ["doc_id"],
    "values": ["wmid_1609459200", "wmid_1609545600"],
    "fields_to_compare": ["content", "status"],
    "exclude_fields": [],
    "sharding_config": {
        "use_sharding_for_pro": true,
        "use_sharding_for_test": false,
        "interval_seconds": 604800,
        "table_count": 14
    }
}

响应数据:

{
    "success": true,
    "data": {
        /* 基础比对结果同单表查询 */
        "sharding_info": {
            "pro_shard_mapping": {
                "wmid_1609459200": "user_data_0",
                "wmid_1609545600": "user_data_1"
            },
            "test_shard_mapping": {
                "wmid_1609459200": "user_data",
                "wmid_1609545600": "user_data"
            },
            "failed_keys": [],
            "shard_stats": {
                "pro_tables_used": ["user_data_0", "user_data_1"],
                "test_tables_used": ["user_data"],
                "timestamp_extraction_success_rate": 100.0
            }
        }
    },
    "message": "分表查询比对完成",
    "execution_time": 2.15
}

2.2 Redis集群比对API

2.2.1 执行Redis集群比对

端点: POST /api/redis/compare

功能: 执行Redis集群数据比对分析

请求参数:

{
    "cluster1_config": {
        "name": "生产集群",
        "nodes": [
            {"host": "10.0.1.100", "port": 6379},
            {"host": "10.0.1.101", "port": 6380}
        ],
        "password": "redis_password",
        "socket_timeout": 3,
        "socket_connect_timeout": 3,
        "max_connections_per_node": 16
    },
    "cluster2_config": {
        "name": "测试集群",
        "nodes": [{"host": "10.0.2.100", "port": 6379}],
        "password": null
    },
    "query_mode": "specified",
    "keys": ["user:1001", "user:1002", "session:abc123"],
    "sample_config": {
        "count": 100,
        "pattern": "*",
        "source_cluster": "cluster2"
    }
}

响应数据:

{
    "success": true,
    "data": {
        "total_keys": 3,
        "cluster1_found": 2,
        "cluster2_found": 3,
        "differences": [
            {
                "key": "user:1001",
                "cluster1_value": "{\"name\":\"John\",\"age\":25}",
                "cluster2_value": "{\"name\":\"John\",\"age\":26}",
                "value_type": "string",
                "difference_type": "value_mismatch"
            }
        ],
        "identical": [
            {
                "key": "user:1002",
                "value": "{\"name\":\"Jane\",\"age\":30}",
                "value_type": "string"
            }
        ],
        "missing_in_cluster1": ["session:abc123"],
        "missing_in_cluster2": [],
        "cluster_stats": {
            "cluster1": {
                "connection_status": "connected",
                "response_time_avg": 0.15,
                "nodes_status": [
                    {"host": "10.0.1.100", "port": 6379, "status": "connected"},
                    {"host": "10.0.1.101", "port": 6380, "status": "connected"}
                ]
            },
            "cluster2": {
                "connection_status": "connected",
                "response_time_avg": 0.12,
                "nodes_status": [
                    {"host": "10.0.2.100", "port": 6379, "status": "connected"}
                ]
            }
        },
        "performance_summary": {
            "total_execution_time": 0.85,
            "keys_per_second": 3.53,
            "data_transferred_kb": 2.1
        }
    },
    "message": "Redis集群比对完成"
}

2.3 配置管理API

2.3.1 获取默认配置

端点: GET /api/default-config

功能: 获取系统默认数据库配置

响应数据:

{
    "success": true,
    "data": {
        "pro_config": {
            "cluster_name": "production-cluster",
            "datacenter": "datacenter1",
            "hosts": ["127.0.0.1"],
            "port": 9042,
            "username": "",
            "password": "",
            "keyspace": "production_ks",
            "table": "table_name"
        },
        "test_config": {
            "cluster_name": "test-cluster",
            "datacenter": "datacenter1",
            "hosts": ["127.0.0.1"],
            "port": 9042,
            "username": "",
            "password": "",
            "keyspace": "test_ks",
            "table": "table_name"
        }
    }
}

2.3.2 创建配置组

端点: POST /api/config-groups

请求参数:

{
    "name": "生产环境配置",
    "description": "生产环境Cassandra配置组",
    "pro_config": { /* Cassandra配置 */ },
    "test_config": { /* Cassandra配置 */ },
    "query_config": {
        "keys": ["user_id"],
        "fields_to_compare": [],
        "exclude_fields": []
    },
    "sharding_config": {
        "use_sharding_for_pro": false,
        "use_sharding_for_test": false,
        "interval_seconds": 604800,
        "table_count": 14
    }
}

响应数据:

{
    "success": true,
    "data": {
        "id": 1,
        "name": "生产环境配置",
        "created_at": "2024-08-05T10:30:00Z"
    },
    "message": "配置组创建成功"
}

2.3.3 获取配置组列表

端点: GET /api/config-groups

响应数据:

{
    "success": true,
    "data": [
        {
            "id": 1,
            "name": "生产环境配置",
            "description": "生产环境Cassandra配置组",
            "created_at": "2024-08-05T10:30:00Z",
            "updated_at": "2024-08-05T10:30:00Z"
        }
    ]
}

2.3.4 获取特定配置组

端点: GET /api/config-groups/{id}

响应数据:

{
    "success": true,
    "data": {
        "id": 1,
        "name": "生产环境配置",
        "description": "生产环境Cassandra配置组",
        "pro_config": { /* 完整配置 */ },
        "test_config": { /* 完整配置 */ },
        "query_config": { /* 查询配置 */ },
        "sharding_config": { /* 分表配置 */ },
        "created_at": "2024-08-05T10:30:00Z",
        "updated_at": "2024-08-05T10:30:00Z"
    }
}

2.3.5 删除配置组

端点: DELETE /api/config-groups/{id}

响应数据:

{
    "success": true,
    "data": null,
    "message": "配置组删除成功"
}

2.4 查询历史管理API

2.4.1 获取查询历史列表

端点: GET /api/query-history

查询参数:

  • limit: 返回记录数量限制 (默认50)
  • offset: 偏移量 (默认0)
  • query_type: 查询类型 (single/sharding)

响应数据:

{
    "success": true,
    "data": {
        "items": [
            {
                "id": 1,
                "name": "用户数据比对-20240805",
                "description": "生产环境用户数据比对",
                "query_type": "single",
                "total_keys": 100,
                "differences_count": 5,
                "identical_count": 95,
                "execution_time": 2.5,
                "created_at": "2024-08-05T10:30:00Z"
            }
        ],
        "total": 1,
        "has_more": false
    }
}

2.4.2 保存查询历史

端点: POST /api/query-history

请求参数:

{
    "name": "用户数据比对-20240805",
    "description": "生产环境用户数据比对",
    "pro_config": { /* 生产配置 */ },
    "test_config": { /* 测试配置 */ },
    "query_config": { /* 查询配置 */ },
    "query_keys": ["1001", "1002", "1003"],
    "results_summary": {
        "total_keys": 3,
        "differences_count": 1,
        "identical_count": 2
    },
    "execution_time": 1.25,
    "query_type": "single",
    "sharding_config": null,
    "raw_results": { /* 完整查询结果 */ }
}

2.4.3 获取历史记录详情

端点: GET /api/query-history/{id}

响应数据:

{
    "success": true,
    "data": {
        "id": 1,
        "name": "用户数据比对-20240805",
        "description": "生产环境用户数据比对",
        "pro_config": { /* 完整配置 */ },
        "test_config": { /* 完整配置 */ },
        "query_config": { /* 查询配置 */ },
        "query_keys": ["1001", "1002", "1003"],
        "results_summary": { /* 结果摘要 */ },
        "execution_time": 1.25,
        "query_type": "single",
        "created_at": "2024-08-05T10:30:00Z"
    }
}

2.4.4 获取历史记录完整结果

端点: GET /api/query-history/{id}/results

响应数据:

{
    "success": true,
    "data": {
        "differences": [ /* 完整差异数据 */ ],
        "identical_results": [ /* 完整相同数据 */ ],
        "raw_pro_data": [ /* 生产原始数据 */ ],
        "raw_test_data": [ /* 测试原始数据 */ ],
        "field_diff_count": { /* 字段差异统计 */ },
        "summary": { /* 详细分析报告 */ }
    }
}

2.5 日志管理API

2.5.1 获取查询日志

端点: GET /api/query-logs

查询参数:

  • limit: 返回记录数量 (默认100)
  • level: 日志级别 (INFO/WARNING/ERROR)
  • history_id: 关联的历史记录ID

响应数据:

{
    "success": true,
    "data": {
        "logs": [
            {
                "id": 1,
                "batch_id": "batch-uuid-123",
                "history_id": 1,
                "timestamp": "2024-08-05T10:30:01.123Z",
                "level": "INFO",
                "message": "开始执行Cassandra查询",
                "query_type": "cassandra_single",
                "created_at": "2024-08-05T10:30:01Z"
            },
            {
                "id": 2,
                "batch_id": "batch-uuid-123",
                "history_id": 1,
                "timestamp": "2024-08-05T10:30:02.456Z",
                "level": "INFO",
                "message": "生产环境查询完成返回3条记录",
                "query_type": "cassandra_single",
                "created_at": "2024-08-05T10:30:02Z"
            }
        ],
        "total": 2
    }
}

2.5.2 获取特定历史记录的日志

端点: GET /api/query-logs/history/{id}

响应数据:

{
    "success": true,
    "data": {
        "history_id": 1,
        "logs": [ /* 该历史记录相关的所有日志 */ ],
        "log_summary": {
            "total_logs": 10,
            "info_count": 8,
            "warning_count": 1,
            "error_count": 1,
            "start_time": "2024-08-05T10:30:00Z",
            "end_time": "2024-08-05T10:30:05Z"
        }
    }
}

2.5.3 清空查询日志

端点: DELETE /api/query-logs

响应数据:

{
    "success": true,
    "data": {
        "deleted_count": 150
    },
    "message": "查询日志清空成功"
}

2.6 系统管理API

2.6.1 初始化数据库

端点: POST /api/init-db

功能: 初始化SQLite数据库表结构

响应数据:

{
    "success": true,
    "data": {
        "tables_created": [
            "config_groups",
            "query_history", 
            "query_logs"
        ]
    },
    "message": "数据库初始化成功"
}

2.6.2 系统健康检查

端点: GET /api/health

响应数据:

{
    "success": true,
    "data": {
        "status": "healthy",
        "version": "2.0.0",
        "uptime": "2 days, 3 hours, 45 minutes",
        "database": {
            "sqlite": {
                "status": "connected",
                "file_size_mb": 15.2
            }
        },
        "memory_usage": {
            "used_mb": 128.5,
            "available_mb": 3967.5
        },
        "last_check": "2024-08-05T10:30:00Z"
    }
}

3. 错误处理

3.1 错误响应格式

{
    "success": false,
    "error": {
        "code": "VALIDATION_ERROR",
        "message": "请求参数验证失败",
        "details": {
            "field": "pro_config.hosts",
            "issue": "hosts字段不能为空"
        }
    },
    "timestamp": "2024-08-05T10:30:00Z",
    "request_id": "uuid-string"
}

3.2 错误码定义

错误码 HTTP状态码 说明
VALIDATION_ERROR 400 请求参数验证失败
CONNECTION_ERROR 500 数据库连接失败
QUERY_ERROR 500 查询执行失败
TIMEOUT_ERROR 408 请求超时
NOT_FOUND 404 资源不存在
CONFLICT 409 资源冲突
SYSTEM_ERROR 500 系统内部错误
AUTH_ERROR 401 认证失败
PERMISSION_DENIED 403 权限不足

3.3 详细错误场景

3.3.1 连接错误

{
    "success": false,
    "error": {
        "code": "CONNECTION_ERROR",
        "message": "无法连接到Cassandra集群",
        "details": {
            "cluster": "production-cluster",
            "hosts": ["10.0.1.100", "10.0.1.101"],
            "error_detail": "Connection refused",
            "suggestions": [
                "检查网络连通性",
                "验证主机地址和端口",
                "确认Cassandra服务状态"
            ]
        }
    }
}

3.3.2 查询错误

{
    "success": false,
    "error": {
        "code": "QUERY_ERROR", 
        "message": "CQL查询执行失败",
        "details": {
            "query": "SELECT * FROM user_data WHERE user_id IN (?)",
            "error_detail": "Invalid keyspace name 'invalid_ks'",
            "suggestions": [
                "检查keyspace名称是否正确",
                "确认表名拼写无误",
                "验证字段名是否存在"
            ]
        }
    }
}

4. 认证和授权

4.1 认证机制

当前版本暂未实现认证机制所有API端点均为开放访问。在生产环境中建议实现以下认证方式

  • API Key认证: 基于API密钥的简单认证
  • JWT Token: JSON Web Token认证
  • OAuth 2.0: 标准OAuth认证流程
  • LDAP集成: 企业级LDAP认证

4.2 权限控制

建议实施基于角色的访问控制RBAC

{
    "roles": [
        {
            "name": "admin",
            "permissions": ["read", "write", "delete", "config"]
        },
        {
            "name": "operator", 
            "permissions": ["read", "write"]
        },
        {
            "name": "viewer",
            "permissions": ["read"]
        }
    ]
}

5. API版本管理

5.1 版本策略

  • URL版本控制: /api/v1/query, /api/v2/query
  • Header版本控制: Accept: application/vnd.datatools.v1+json
  • 向后兼容: 保持旧版本API的兼容性
  • 弃用策略: 提前通知API弃用计划

5.2 版本变更记录

API版本 发布日期 主要变更 兼容性
v1.0 2024-08-05 初始版本发布 N/A

6. 性能和限制

6.1 API限制

  • 请求频率: 每分钟最多100次请求
  • 并发连接: 最多10个并发连接
  • 响应大小: 单次响应最大50MB
  • 查询超时: 默认120秒超时

6.2 性能优化

  • 连接池: 复用数据库连接
  • 缓存策略: 配置数据缓存
  • 异步处理: 长时间查询异步执行
  • 分页处理: 大数据集分页返回

7. 监控和日志

7.1 API监控指标

  • 响应时间: 平均响应时间和95分位数
  • 成功率: API调用成功率统计
  • 错误率: 各类错误的发生率
  • 吞吐量: 每秒处理的请求数

7.2 日志记录

  • 访问日志: 记录所有API访问
  • 错误日志: 详细的错误信息和堆栈
  • 性能日志: 慢查询和性能瓶颈
  • 审计日志: 重要操作的审计记录

版本: v1.0
更新日期: 2024-08-05
维护者: DataTools Pro Team