Files
BigDataTool/docs/api-design.md
2025-08-05 23:27:25 +08:00

759 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# DataTools Pro API接口设计文档
## 1. API概述
### 1.1 设计原则
- **RESTful设计**: 遵循REST架构风格
- **统一格式**: 标准化的请求和响应格式
- **版本控制**: 支持API版本管理
- **错误处理**: 完整的错误码和错误信息
- **安全性**: 输入验证和权限控制
### 1.2 基础信息
- **Base URL**: `http://localhost:5000`
- **Content-Type**: `application/json`
- **字符编码**: `UTF-8`
- **API版本**: `v1.0`
### 1.3 响应格式规范
```json
{
"success": true,
"data": {},
"message": "操作成功",
"timestamp": "2024-08-05T10:30:00Z",
"request_id": "uuid-string"
}
```
## 2. 核心API端点
### 2.1 Cassandra数据比对API
#### 2.1.1 执行单表查询比对
**端点**: `POST /api/query`
**功能**: 执行Cassandra单表数据查询和比对分析
**请求参数**:
```json
{
"pro_config": {
"cluster_name": "production-cluster",
"datacenter": "datacenter1",
"hosts": ["10.0.1.100", "10.0.1.101"],
"port": 9042,
"username": "cassandra",
"password": "password",
"keyspace": "production_ks",
"table": "user_data"
},
"test_config": {
"cluster_name": "test-cluster",
"datacenter": "datacenter1",
"hosts": ["10.0.2.100"],
"port": 9042,
"username": "cassandra",
"password": "password",
"keyspace": "test_ks",
"table": "user_data"
},
"keys": ["user_id"],
"values": ["1001", "1002", "1003"],
"fields_to_compare": ["name", "email", "status"],
"exclude_fields": ["created_at", "updated_at"]
}
```
**响应数据**:
```json
{
"success": true,
"data": {
"total_keys": 3,
"pro_count": 3,
"test_count": 2,
"differences": [
{
"key": {"user_id": "1001"},
"field": "email",
"pro_value": "user1@prod.com",
"test_value": "user1@test.com",
"message": "字段值不匹配"
}
],
"identical_results": [
{
"key": {"user_id": "1002"},
"pro_fields": {"name": "User2", "email": "user2@example.com"},
"test_fields": {"name": "User2", "email": "user2@example.com"}
}
],
"field_diff_count": {
"email": 1
},
"raw_pro_data": [...],
"raw_test_data": [...],
"summary": {
"overview": "查询了3个Key发现1处差异",
"percentages": {
"match_rate": 66.67,
"diff_rate": 33.33
},
"field_analysis": {
"email": {"diff_count": 1, "diff_rate": 33.33}
},
"recommendations": ["建议检查邮箱字段的数据同步"]
}
},
"message": "查询比对完成",
"execution_time": 1.25,
"timestamp": "2024-08-05T10:30:00Z"
}
```
#### 2.1.2 执行分表查询比对
**端点**: `POST /api/sharding-query`
**功能**: 执行Cassandra分表数据查询和比对分析
**请求参数**:
```json
{
"pro_config": { /* 同单表查询配置 */ },
"test_config": { /* 同单表查询配置 */ },
"keys": ["doc_id"],
"values": ["wmid_1609459200", "wmid_1609545600"],
"fields_to_compare": ["content", "status"],
"exclude_fields": [],
"sharding_config": {
"use_sharding_for_pro": true,
"use_sharding_for_test": false,
"interval_seconds": 604800,
"table_count": 14
}
}
```
**响应数据**:
```json
{
"success": true,
"data": {
/* 基础比对结果同单表查询 */
"sharding_info": {
"pro_shard_mapping": {
"wmid_1609459200": "user_data_0",
"wmid_1609545600": "user_data_1"
},
"test_shard_mapping": {
"wmid_1609459200": "user_data",
"wmid_1609545600": "user_data"
},
"failed_keys": [],
"shard_stats": {
"pro_tables_used": ["user_data_0", "user_data_1"],
"test_tables_used": ["user_data"],
"timestamp_extraction_success_rate": 100.0
}
}
},
"message": "分表查询比对完成",
"execution_time": 2.15
}
```
### 2.2 Redis集群比对API
#### 2.2.1 执行Redis集群比对
**端点**: `POST /api/redis/compare`
**功能**: 执行Redis集群数据比对分析
**请求参数**:
```json
{
"cluster1_config": {
"name": "生产集群",
"nodes": [
{"host": "10.0.1.100", "port": 6379},
{"host": "10.0.1.101", "port": 6380}
],
"password": "redis_password",
"socket_timeout": 3,
"socket_connect_timeout": 3,
"max_connections_per_node": 16
},
"cluster2_config": {
"name": "测试集群",
"nodes": [{"host": "10.0.2.100", "port": 6379}],
"password": null
},
"query_mode": "specified",
"keys": ["user:1001", "user:1002", "session:abc123"],
"sample_config": {
"count": 100,
"pattern": "*",
"source_cluster": "cluster2"
}
}
```
**响应数据**:
```json
{
"success": true,
"data": {
"total_keys": 3,
"cluster1_found": 2,
"cluster2_found": 3,
"differences": [
{
"key": "user:1001",
"cluster1_value": "{\"name\":\"John\",\"age\":25}",
"cluster2_value": "{\"name\":\"John\",\"age\":26}",
"value_type": "string",
"difference_type": "value_mismatch"
}
],
"identical": [
{
"key": "user:1002",
"value": "{\"name\":\"Jane\",\"age\":30}",
"value_type": "string"
}
],
"missing_in_cluster1": ["session:abc123"],
"missing_in_cluster2": [],
"cluster_stats": {
"cluster1": {
"connection_status": "connected",
"response_time_avg": 0.15,
"nodes_status": [
{"host": "10.0.1.100", "port": 6379, "status": "connected"},
{"host": "10.0.1.101", "port": 6380, "status": "connected"}
]
},
"cluster2": {
"connection_status": "connected",
"response_time_avg": 0.12,
"nodes_status": [
{"host": "10.0.2.100", "port": 6379, "status": "connected"}
]
}
},
"performance_summary": {
"total_execution_time": 0.85,
"keys_per_second": 3.53,
"data_transferred_kb": 2.1
}
},
"message": "Redis集群比对完成"
}
```
### 2.3 配置管理API
#### 2.3.1 获取默认配置
**端点**: `GET /api/default-config`
**功能**: 获取系统默认数据库配置
**响应数据**:
```json
{
"success": true,
"data": {
"pro_config": {
"cluster_name": "production-cluster",
"datacenter": "datacenter1",
"hosts": ["127.0.0.1"],
"port": 9042,
"username": "",
"password": "",
"keyspace": "production_ks",
"table": "table_name"
},
"test_config": {
"cluster_name": "test-cluster",
"datacenter": "datacenter1",
"hosts": ["127.0.0.1"],
"port": 9042,
"username": "",
"password": "",
"keyspace": "test_ks",
"table": "table_name"
}
}
}
```
#### 2.3.2 创建配置组
**端点**: `POST /api/config-groups`
**请求参数**:
```json
{
"name": "生产环境配置",
"description": "生产环境Cassandra配置组",
"pro_config": { /* Cassandra配置 */ },
"test_config": { /* Cassandra配置 */ },
"query_config": {
"keys": ["user_id"],
"fields_to_compare": [],
"exclude_fields": []
},
"sharding_config": {
"use_sharding_for_pro": false,
"use_sharding_for_test": false,
"interval_seconds": 604800,
"table_count": 14
}
}
```
**响应数据**:
```json
{
"success": true,
"data": {
"id": 1,
"name": "生产环境配置",
"created_at": "2024-08-05T10:30:00Z"
},
"message": "配置组创建成功"
}
```
#### 2.3.3 获取配置组列表
**端点**: `GET /api/config-groups`
**响应数据**:
```json
{
"success": true,
"data": [
{
"id": 1,
"name": "生产环境配置",
"description": "生产环境Cassandra配置组",
"created_at": "2024-08-05T10:30:00Z",
"updated_at": "2024-08-05T10:30:00Z"
}
]
}
```
#### 2.3.4 获取特定配置组
**端点**: `GET /api/config-groups/{id}`
**响应数据**:
```json
{
"success": true,
"data": {
"id": 1,
"name": "生产环境配置",
"description": "生产环境Cassandra配置组",
"pro_config": { /* 完整配置 */ },
"test_config": { /* 完整配置 */ },
"query_config": { /* 查询配置 */ },
"sharding_config": { /* 分表配置 */ },
"created_at": "2024-08-05T10:30:00Z",
"updated_at": "2024-08-05T10:30:00Z"
}
}
```
#### 2.3.5 删除配置组
**端点**: `DELETE /api/config-groups/{id}`
**响应数据**:
```json
{
"success": true,
"data": null,
"message": "配置组删除成功"
}
```
### 2.4 查询历史管理API
#### 2.4.1 获取查询历史列表
**端点**: `GET /api/query-history`
**查询参数**:
- `limit`: 返回记录数量限制 (默认50)
- `offset`: 偏移量 (默认0)
- `query_type`: 查询类型 (`single`/`sharding`)
**响应数据**:
```json
{
"success": true,
"data": {
"items": [
{
"id": 1,
"name": "用户数据比对-20240805",
"description": "生产环境用户数据比对",
"query_type": "single",
"total_keys": 100,
"differences_count": 5,
"identical_count": 95,
"execution_time": 2.5,
"created_at": "2024-08-05T10:30:00Z"
}
],
"total": 1,
"has_more": false
}
}
```
#### 2.4.2 保存查询历史
**端点**: `POST /api/query-history`
**请求参数**:
```json
{
"name": "用户数据比对-20240805",
"description": "生产环境用户数据比对",
"pro_config": { /* 生产配置 */ },
"test_config": { /* 测试配置 */ },
"query_config": { /* 查询配置 */ },
"query_keys": ["1001", "1002", "1003"],
"results_summary": {
"total_keys": 3,
"differences_count": 1,
"identical_count": 2
},
"execution_time": 1.25,
"query_type": "single",
"sharding_config": null,
"raw_results": { /* 完整查询结果 */ }
}
```
#### 2.4.3 获取历史记录详情
**端点**: `GET /api/query-history/{id}`
**响应数据**:
```json
{
"success": true,
"data": {
"id": 1,
"name": "用户数据比对-20240805",
"description": "生产环境用户数据比对",
"pro_config": { /* 完整配置 */ },
"test_config": { /* 完整配置 */ },
"query_config": { /* 查询配置 */ },
"query_keys": ["1001", "1002", "1003"],
"results_summary": { /* 结果摘要 */ },
"execution_time": 1.25,
"query_type": "single",
"created_at": "2024-08-05T10:30:00Z"
}
}
```
#### 2.4.4 获取历史记录完整结果
**端点**: `GET /api/query-history/{id}/results`
**响应数据**:
```json
{
"success": true,
"data": {
"differences": [ /* 完整差异数据 */ ],
"identical_results": [ /* 完整相同数据 */ ],
"raw_pro_data": [ /* 生产原始数据 */ ],
"raw_test_data": [ /* 测试原始数据 */ ],
"field_diff_count": { /* 字段差异统计 */ },
"summary": { /* 详细分析报告 */ }
}
}
```
### 2.5 日志管理API
#### 2.5.1 获取查询日志
**端点**: `GET /api/query-logs`
**查询参数**:
- `limit`: 返回记录数量 (默认100)
- `level`: 日志级别 (`INFO`/`WARNING`/`ERROR`)
- `history_id`: 关联的历史记录ID
**响应数据**:
```json
{
"success": true,
"data": {
"logs": [
{
"id": 1,
"batch_id": "batch-uuid-123",
"history_id": 1,
"timestamp": "2024-08-05T10:30:01.123Z",
"level": "INFO",
"message": "开始执行Cassandra查询",
"query_type": "cassandra_single",
"created_at": "2024-08-05T10:30:01Z"
},
{
"id": 2,
"batch_id": "batch-uuid-123",
"history_id": 1,
"timestamp": "2024-08-05T10:30:02.456Z",
"level": "INFO",
"message": "生产环境查询完成返回3条记录",
"query_type": "cassandra_single",
"created_at": "2024-08-05T10:30:02Z"
}
],
"total": 2
}
}
```
#### 2.5.2 获取特定历史记录的日志
**端点**: `GET /api/query-logs/history/{id}`
**响应数据**:
```json
{
"success": true,
"data": {
"history_id": 1,
"logs": [ /* 该历史记录相关的所有日志 */ ],
"log_summary": {
"total_logs": 10,
"info_count": 8,
"warning_count": 1,
"error_count": 1,
"start_time": "2024-08-05T10:30:00Z",
"end_time": "2024-08-05T10:30:05Z"
}
}
}
```
#### 2.5.3 清空查询日志
**端点**: `DELETE /api/query-logs`
**响应数据**:
```json
{
"success": true,
"data": {
"deleted_count": 150
},
"message": "查询日志清空成功"
}
```
### 2.6 系统管理API
#### 2.6.1 初始化数据库
**端点**: `POST /api/init-db`
**功能**: 初始化SQLite数据库表结构
**响应数据**:
```json
{
"success": true,
"data": {
"tables_created": [
"config_groups",
"query_history",
"query_logs"
]
},
"message": "数据库初始化成功"
}
```
#### 2.6.2 系统健康检查
**端点**: `GET /api/health`
**响应数据**:
```json
{
"success": true,
"data": {
"status": "healthy",
"version": "2.0.0",
"uptime": "2 days, 3 hours, 45 minutes",
"database": {
"sqlite": {
"status": "connected",
"file_size_mb": 15.2
}
},
"memory_usage": {
"used_mb": 128.5,
"available_mb": 3967.5
},
"last_check": "2024-08-05T10:30:00Z"
}
}
```
## 3. 错误处理
### 3.1 错误响应格式
```json
{
"success": false,
"error": {
"code": "VALIDATION_ERROR",
"message": "请求参数验证失败",
"details": {
"field": "pro_config.hosts",
"issue": "hosts字段不能为空"
}
},
"timestamp": "2024-08-05T10:30:00Z",
"request_id": "uuid-string"
}
```
### 3.2 错误码定义
| 错误码 | HTTP状态码 | 说明 |
|--------|-----------|------|
| `VALIDATION_ERROR` | 400 | 请求参数验证失败 |
| `CONNECTION_ERROR` | 500 | 数据库连接失败 |
| `QUERY_ERROR` | 500 | 查询执行失败 |
| `TIMEOUT_ERROR` | 408 | 请求超时 |
| `NOT_FOUND` | 404 | 资源不存在 |
| `CONFLICT` | 409 | 资源冲突 |
| `SYSTEM_ERROR` | 500 | 系统内部错误 |
| `AUTH_ERROR` | 401 | 认证失败 |
| `PERMISSION_DENIED` | 403 | 权限不足 |
### 3.3 详细错误场景
#### 3.3.1 连接错误
```json
{
"success": false,
"error": {
"code": "CONNECTION_ERROR",
"message": "无法连接到Cassandra集群",
"details": {
"cluster": "production-cluster",
"hosts": ["10.0.1.100", "10.0.1.101"],
"error_detail": "Connection refused",
"suggestions": [
"检查网络连通性",
"验证主机地址和端口",
"确认Cassandra服务状态"
]
}
}
}
```
#### 3.3.2 查询错误
```json
{
"success": false,
"error": {
"code": "QUERY_ERROR",
"message": "CQL查询执行失败",
"details": {
"query": "SELECT * FROM user_data WHERE user_id IN (?)",
"error_detail": "Invalid keyspace name 'invalid_ks'",
"suggestions": [
"检查keyspace名称是否正确",
"确认表名拼写无误",
"验证字段名是否存在"
]
}
}
}
```
## 4. 认证和授权
### 4.1 认证机制
当前版本暂未实现认证机制所有API端点均为开放访问。在生产环境中建议实现以下认证方式
- **API Key认证**: 基于API密钥的简单认证
- **JWT Token**: JSON Web Token认证
- **OAuth 2.0**: 标准OAuth认证流程
- **LDAP集成**: 企业级LDAP认证
### 4.2 权限控制
建议实施基于角色的访问控制RBAC
```json
{
"roles": [
{
"name": "admin",
"permissions": ["read", "write", "delete", "config"]
},
{
"name": "operator",
"permissions": ["read", "write"]
},
{
"name": "viewer",
"permissions": ["read"]
}
]
}
```
## 5. API版本管理
### 5.1 版本策略
- **URL版本控制**: `/api/v1/query`, `/api/v2/query`
- **Header版本控制**: `Accept: application/vnd.datatools.v1+json`
- **向后兼容**: 保持旧版本API的兼容性
- **弃用策略**: 提前通知API弃用计划
### 5.2 版本变更记录
| API版本 | 发布日期 | 主要变更 | 兼容性 |
|---------|----------|----------|--------|
| v1.0 | 2024-08-05 | 初始版本发布 | N/A |
## 6. 性能和限制
### 6.1 API限制
- **请求频率**: 每分钟最多100次请求
- **并发连接**: 最多10个并发连接
- **响应大小**: 单次响应最大50MB
- **查询超时**: 默认120秒超时
### 6.2 性能优化
- **连接池**: 复用数据库连接
- **缓存策略**: 配置数据缓存
- **异步处理**: 长时间查询异步执行
- **分页处理**: 大数据集分页返回
## 7. 监控和日志
### 7.1 API监控指标
- **响应时间**: 平均响应时间和95分位数
- **成功率**: API调用成功率统计
- **错误率**: 各类错误的发生率
- **吞吐量**: 每秒处理的请求数
### 7.2 日志记录
- **访问日志**: 记录所有API访问
- **错误日志**: 详细的错误信息和堆栈
- **性能日志**: 慢查询和性能瓶颈
- **审计日志**: 重要操作的审计记录
---
**版本**: v1.0
**更新日期**: 2024-08-05
**维护者**: DataTools Pro Team