Compare commits

...

37 Commits

Author SHA1 Message Date
fe2803f3da 修复 部分 json 数据不能识别
修复 标签字段比对不完全
2025-08-14 16:32:44 +08:00
9ed05dd58d 修复比对错误 2025-08-14 14:43:24 +08:00
211bcc9066 新增意见展开按钮 删除冗余查看按钮 修复差异字段查询 2025-08-12 16:48:00 +08:00
8c3f3df826 新增一键复制差异主键 2025-08-12 16:38:02 +08:00
8d7c4e3730 修改模版内容 2025-08-12 16:27:21 +08:00
cdf7e36ba3 修改模版内容 2025-08-12 16:27:00 +08:00
fbca92ba77 修改文档 2025-08-11 14:09:57 +08:00
0b1dc6b8ca 修复展示错误 2025-08-11 14:07:10 +08:00
eabca97350 项目打包 2025-08-11 09:34:51 +08:00
01e323a7ba 项目打包 2025-08-11 09:34:45 +08:00
d42cefd9ca 完善页面 2025-08-11 09:34:29 +08:00
0ac375eb50 增加文档 2025-08-05 23:27:25 +08:00
8097e9b769 优化页面布局 2025-08-05 20:52:26 +08:00
867bf67b16 优化项目整合内容 2025-08-05 20:11:28 +08:00
4a0800a776 优化项目整合内容 2025-08-05 19:56:38 +08:00
3f78ce7365 优化项目整合内容 2025-08-05 11:23:49 +08:00
701a9a552e 修复redis识别 2025-08-04 22:07:42 +08:00
3272525c92 修复reids 2025-08-04 21:55:48 +08:00
07467d27ae 修复reids 2025-08-04 21:55:35 +08:00
dbf4255aea 自动记录日志 2025-08-04 15:41:59 +08:00
4c4d168471 增加Redis查询比对 2025-08-04 09:14:27 +08:00
e1a566012d 增加分表查询 2025-08-03 18:50:24 +08:00
7aee03b7b9 完整查询历史 2025-08-03 11:40:50 +08:00
111ac64592 完善查询日志 2025-08-03 10:57:47 +08:00
8e340c801f 完善查询日志 2025-08-03 10:52:45 +08:00
313319e2bb 完善查询历史记录 2025-08-03 10:28:09 +08:00
f674373401 完善查询历史记录 2025-08-02 23:09:47 +08:00
36915c45ea 完善查询历史记录 2025-08-02 22:47:39 +08:00
9cfc363227 完善查询日志分组 2025-08-02 22:33:23 +08:00
eb48cf17e6 完善查询 2025-08-02 22:21:45 +08:00
c3fba1b248 完善查询 2025-08-02 22:07:43 +08:00
eae6a14272 完善查询 2025-08-02 21:37:19 +08:00
1a75dcd0fc 分表参数 2025-08-02 21:14:26 +08:00
8faaabe3ba 分表参数 2025-08-02 20:47:18 +08:00
5d61060a72 删掉无用日志 2025-08-02 18:59:44 +08:00
929eb9abc1 修改页面 2025-08-02 17:56:56 +08:00
6558541a6f 删除以前版本文件 2025-08-02 16:55:33 +08:00
27 changed files with 12878 additions and 2181 deletions

132
.dockerignore Normal file
View File

@@ -0,0 +1,132 @@
# BigDataTool Docker 构建忽略文件
# 排除不需要打包到镜像中的文件和目录
# 版本控制
.git
.gitignore
.gitattributes
# IDE和编辑器文件
.vscode/
.idea/
*.swp
*.swo
*~
# Python缓存文件
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# 虚拟环境
venv/
env/
ENV/
env.bak/
venv.bak/
# 测试文件
.tox/
.coverage
.pytest_cache/
htmlcov/
.coverage.*
coverage.xml
*.cover
.hypothesis/
# 文档构建
docs/_build/
.sphinx/
# 日志文件
*.log
logs/*.log
# 本地数据库文件(可选,如果需要持久化可以移除这行)
*.db
*.sqlite
*.sqlite3
# 配置文件(敏感信息)
.env
.env.local
.env.*.local
*.pem
*.key
config/secrets/
# Docker相关文件
Dockerfile.dev
docker-compose.dev.yml
docker-compose.override.yml
# 临时文件
*.tmp
*.temp
tmp/
temp/
# macOS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
# Windows
*.exe
*.msi
*.msm
*.msp
# Linux
*~
# 备份文件
*.bak
*.backup
*.old
# 本地开发文件
local/
.local/
dev/
# 缓存目录
.cache/
*.cache
# 运行时文件
*.pid
*.sock
# 监控和性能分析
*.prof
*.pstats
# 其他不需要的文件
README.dev.md
CONTRIBUTING.md
.github/
scripts/dev/

295
CLAUDE.md
View File

@@ -1,295 +0,0 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## 项目架构
这是一个基于 Flask 的数据库查询比对工具,用于比较 Cassandra 数据库中生产环境和测试环境的数据差异。现已支持单表查询和TWCS分表查询两种模式。
### 核心组件架构
**后端 (Flask)**
- `app.py`: 主应用文件包含所有API端点和数据处理逻辑
- 数据库连接管理Cassandra + SQLite
- 查询执行和结果比对算法
- 配置组管理CRUD操作
- JSON字段特殊处理和数组比较逻辑
- 查询历史记录管理
- **分表查询功能(新增)**
- `ShardingCalculator`TWCS时间分表计算器
- `execute_sharding_query()`:分表查询执行
- `execute_mixed_query()`:混合查询支持(生产分表+测试单表组合)
- `/api/sharding-query`分表查询API端点
- `config_groups.db`: SQLite数据库存储用户保存的配置组、查询历史和分表配置
**前端 (原生JavaScript + Bootstrap)**
- `templates/db_compare.html`: 主界面模板,**现已支持单表和分表双模式**
- 分表模式切换开关
- 生产/测试环境独立分表配置
- 分表参数配置(时间间隔、分表数量)
- 分表查询信息展示
- `templates/index.html`: 工具集合首页
- `static/js/app.js`: 核心前端逻辑
- 配置管理和表单处理
- 差异结果的分页展示系统
- 原生数据展示(多种视图模式:格式化、原始、差异对比、树形)
- 高级错误处理和用户反馈
- **分表查询支持(新增)**
- `toggleShardingMode()`:分表模式切换
- `getShardingConfig()`:分表配置获取
- `displayShardingInfo()`:分表查询结果展示
**分表查询功能模块(重要新增)**
- **时间戳提取算法(已更新)**
- **新规则**:使用 `re.sub(r'\D', '', key)` 删除Key中所有非数字字符
- 将提取到的数字字符串转换为整数作为时间戳
- 支持任意格式的Key只要包含数字即可
- 示例:`wmid_1609459200``1609459200``abc123def456``123456`
- **分表索引计算**
- 公式:`int(numbers) // interval_seconds % table_count`
- 默认配置604800秒间隔7天14张分表
- 支持自定义配置
- **混合查询场景**
- 生产环境分表 + 测试环境单表
- 生产环境分表 + 测试环境分表
- 生产环境单表 + 测试环境分表
- 生产环境单表 + 测试环境单表
**示例代码**
- `demo/Query.py`: 独立的Cassandra查询比对脚本示例
- `demo/twcsQuery.py`: 另一个查询示例
- `demo/CalculationLibrary.py`: 分表计算逻辑参考实现
- `test_sharding.py`: 分表功能测试脚本
### 关键功能模块
**数据比对引擎**
- 支持复杂JSON字段的深度比较
- 数组字段的顺序无关比较
- 字段级别的差异统计和分析
- 数据质量评估和建议生成
- 支持包含和排除特定字段的比较
**用户界面特性**
- 分页系统(差异记录和相同记录)
- 实时搜索和过滤
- 原生数据展示JSON语法高亮、树形视图
- 配置导入导出和管理
- 详细的错误诊断和故障排查指南
- 查询历史记录和复用
- **查询日志系统(新增)**
- 实时显示SQL执行日志
- 支持日志级别过滤INFO/WARNING/ERROR
- SQL语句语法高亮显示
- 执行时间和记录数统计
- 日志清空和刷新功能
## 开发相关命令
### 环境设置
```bash
# 安装依赖仅需要Flask和cassandra-driver
pip install -r requirements.txt
# 运行应用默认端口5000
python app.py
# 自定义端口运行
# 修改app.py最后一行app.run(debug=True, port=5001)
```
### 测试和验证
```bash
# 运行分表功能测试(测试时间戳提取和分表索引计算)
python test_sharding.py
# 测试新的分表计算规则
python test_new_sharding.py
# 演示新分表规则的详细工作原理
python demo_new_sharding.py
# 测试查询日志功能
python test_query_logs.py
# 集成测试(分表功能 + 查询日志)
python test_integration.py
# 测试数据库连接和查询功能
# 通过Web界面http://localhost:5000/db-compare
# 或直接运行示例脚本:
python demo/Query.py
python demo/twcsQuery.py
```
### 开发模式
应用默认运行在debug模式代码修改后自动重启。访问
- http://localhost:5000 - 工具集合首页
- http://localhost:5000/db-compare - 数据库比对工具
### 依赖项
- Flask==2.3.3
- cassandra-driver==3.29.1
## API架构说明
### 核心API端点
- `GET /api/default-config`: 获取默认数据库配置
- `POST /api/query`: 执行单表数据库查询比对(原有功能)
- `POST /api/sharding-query`: 执行分表查询比对(新增功能)
- `GET /api/config-groups`: 获取所有配置组
- `POST /api/config-groups`: 创建新配置组
- `GET /api/config-groups/<id>`: 获取特定配置组
- `DELETE /api/config-groups/<id>`: 删除配置组
- `POST /api/init-db`: 初始化SQLite数据库
- `GET /api/query-history`: 获取查询历史
- `POST /api/query-history`: 保存查询历史
- `GET /api/query-history/<id>`: 获取特定历史记录
- `DELETE /api/query-history/<id>`: 删除历史记录
- `GET /api/query-logs`: 获取查询日志支持limit参数
- `DELETE /api/query-logs`: 清空查询日志
### 查询比对流程
**单表查询流程(`/api/query`**
1. 前端发送配置和Key值列表到 `/api/query`
2. 后端创建两个Cassandra连接生产+测试)
3. 并行执行查询,获取原始数据
4. 运行比较算法,生成差异报告
5. 返回完整结果(差异、统计、原始数据)
**分表查询流程(`/api/sharding-query`**
1. 前端发送配置、Key值列表和分表配置到 `/api/sharding-query`
2. 后端使用 `ShardingCalculator` 解析Key中的时间戳
3. 根据分表算法计算每个Key对应的分表名称
4. 创建分表映射关系,并行执行分表查询
5. 汇总所有分表结果,执行比较算法
6. 返回包含分表信息的完整结果
## 数据结构和配置
### 数据库配置结构
**单表查询配置**
```javascript
{
pro_config: {
cluster_name, datacenter, hosts[], port,
username, password, keyspace, table
},
test_config: { /* 同上 */ },
keys: ["主键字段名"],
fields_to_compare: ["字段1", "字段2"], // 空数组=全部字段
exclude_fields: ["排除字段"],
values: ["key1", "key2", "key3"] // 要查询的Key值
}
```
**分表查询配置**
```javascript
{
pro_config: { /* 基础配置同上 */ },
test_config: { /* 基础配置同上 */ },
keys: ["主键字段名"],
fields_to_compare: ["字段1", "字段2"],
exclude_fields: ["排除字段"],
values: ["key1", "key2", "key3"],
sharding_config: {
use_sharding_for_pro: true, // 生产环境是否使用分表
use_sharding_for_test: false, // 测试环境是否使用分表
interval_seconds: 604800, // 分表时间间隔默认7天
table_count: 14 // 分表数量默认14张表
}
}
```
### 查询结果结构
```javascript
{
total_keys, pro_count, test_count,
differences: [{ key, field, pro_value, test_value, message }],
identical_results: [{ key, pro_fields, test_fields }],
field_diff_count: { "field_name": count },
raw_pro_data: [], raw_test_data: [],
summary: { overview, percentages, field_analysis, recommendations },
// 分表查询特有字段
sharding_info: {
pro_shard_mapping: { "key1": "table_name_0", "key2": "table_name_1" },
test_shard_mapping: { /* 同上 */ },
failed_keys: [], // 时间戳提取失败的Key
shard_stats: {
pro_tables_used: ["table_0", "table_1"],
test_tables_used: ["table_0"],
timestamp_extraction_success_rate: 95.5
}
}
}
```
## 开发注意事项
### 分表功能开发指导
- **时间戳解析(已更新)**`ShardingCalculator.extract_timestamp_from_key()` 新规则
- 使用 `re.sub(r'\D', '', key)` 删除所有非数字字符
- 将提取的数字字符串转换为整数作为时间戳
- 不再进行时间戳有效性验证,支持任意数字组合
- **分表索引计算**:使用公式 `int(numbers) // interval_seconds % table_count`
- **错误处理**Key中没有数字字符时会记录到 `failed_keys`
- **混合查询**:支持生产环境分表+测试环境单表的组合场景
- **前端状态**:分表模式通过 `toggleShardingMode()` 切换影响UI和提示文本
### Cassandra连接处理
- 连接包含详细的错误诊断和重试机制
- 使用DCAwareRoundRobinPolicy避免负载均衡警告
- 连接超时设置为10秒
- 失败时提供网络连通性测试
- 支持认证PlainTextAuthProvider
- 支持集群配置cluster_name, datacenter
### 前端状态管理
- `currentResults`: 存储最新查询结果
- 分页状态:`currentIdenticalPage`, `currentDifferencePage`
- 过滤状态:`filteredIdenticalResults`, `filteredDifferenceResults`
- **日志状态(新增)**`allQueryLogs` - 存储所有查询日志
### JSON和数组字段处理
- `normalize_json_string()`: 标准化JSON字符串用于比较
- `compare_array_values()`: 数组的顺序无关比较
- `is_json_field()`: 智能检测JSON字段
- 前端提供专门的JSON语法高亮和树形展示
### 错误处理策略
- 后端分类错误connection_error, validation_error, query_error, system_error
- 前端:详细错误展示,包含配置信息、解决建议、连接测试工具
- 提供交互式故障排查指南
- **查询日志(新增)**所有SQL执行和错误信息都会记录到查询日志中
### 性能考虑
- 大数据集的分页处理
- 原生数据的延迟加载
- JSON格式化的客户端缓存
- 搜索和过滤的防抖处理
### SQLite数据库表结构
**config_groups表**
- id: 主键
- name: 配置组名称(唯一)
- description: 描述
- pro_config: 生产环境配置JSON
- test_config: 测试环境配置JSON
- query_config: 查询配置JSON
- **sharding_config: 分表配置JSON新增字段**
- created_at/updated_at: 时间戳
**query_history表**
- id: 主键
- name: 查询名称
- description: 描述
- pro_config/test_config/query_config: 配置JSON
- query_keys: 查询的键值JSON
- results_summary: 结果摘要JSON
- execution_time: 执行时间
- total_keys/differences_count/identical_count: 统计数据
- created_at: 时间戳

71
Dockerfile Normal file
View File

@@ -0,0 +1,71 @@
# BigDataTool Docker镜像
# 基于Python 3.9 Alpine镜像构建轻量级容器
# 使用官方Python 3.9 Alpine镜像作为基础镜像
FROM python:3.9-alpine
# 设置维护者信息
LABEL maintainer="BigDataTool Team"
LABEL version="2.0"
LABEL description="BigDataTool - 大数据查询比对工具容器化版本"
# 设置环境变量
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
FLASK_HOST=0.0.0.0 \
FLASK_PORT=5000
# 设置工作目录
WORKDIR /app
# 安装系统依赖
# Alpine需要的构建工具和运行时库
RUN apk add --no-cache \
gcc \
musl-dev \
libffi-dev \
openssl-dev \
cargo \
rust \
&& apk add --no-cache --virtual .build-deps \
build-base \
python3-dev
# 复制requirements文件
COPY requirements.txt .
# 安装Python依赖
# 使用国内镜像源加速下载
RUN pip install --no-cache-dir -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/
# 清理构建依赖以减小镜像大小
RUN apk del .build-deps
# 复制应用代码
COPY . .
# 创建必要的目录
RUN mkdir -p logs && \
chmod +x docker-entrypoint.sh || true
# 创建非root用户运行应用
RUN addgroup -g 1001 -S appgroup && \
adduser -u 1001 -S appuser -G appgroup
# 更改文件所有权
RUN chown -R appuser:appgroup /app
# 切换到非root用户
USER appuser
# 暴露端口
EXPOSE 5000
# 健康检查
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:5000/api/health || exit 1
# 设置启动命令
CMD ["python", "app.py"]

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2024 BigDataTool项目组
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

148
Makefile Normal file
View File

@@ -0,0 +1,148 @@
# BigDataTool Docker 管理 Makefile
.PHONY: help build run stop clean logs shell health
# 默认目标
help:
@echo "BigDataTool Docker 管理命令:"
@echo ""
@echo " build 构建Docker镜像"
@echo " run 启动服务(简化版本)"
@echo " run-full 启动完整服务(包含缓存和监控)"
@echo " stop 停止服务"
@echo " restart 重启服务"
@echo " clean 清理容器和镜像"
@echo " logs 查看服务日志"
@echo " shell 进入容器shell"
@echo " health 检查服务健康状态"
@echo " ps 查看运行状态"
@echo ""
@echo "环境变量设置:"
@echo " export SECRET_KEY=your-secret-key"
@echo " export FLASK_ENV=production"
@echo ""
# 构建镜像
build:
@echo "构建BigDataTool Docker镜像..."
docker build -t bigdatatool:latest .
# 快速运行(简化版本)
run:
@echo "启动BigDataTool服务(简化版本)..."
docker-compose -f docker-compose.simple.yml up -d
@echo "服务启动中请等待30秒后访问 http://localhost:5000"
# 完整运行(包含缓存和监控)
run-full:
@echo "启动BigDataTool完整服务..."
docker-compose up -d
@echo "服务启动中请等待30秒后访问:"
@echo " - 主应用: http://localhost:5000"
@echo " - Redis缓存: localhost:6379"
# 生产环境运行(包含Nginx)
run-prod:
@echo "启动生产环境服务..."
docker-compose --profile production up -d
@echo "生产环境服务启动,访问地址:"
@echo " - HTTP: http://localhost"
@echo " - HTTPS: https://localhost (需要SSL证书)"
# 监控环境运行
run-monitor:
@echo "启动监控环境..."
docker-compose --profile monitoring up -d
@echo "监控服务启动,访问地址:"
@echo " - 主应用: http://localhost:5000"
@echo " - Prometheus: http://localhost:9090"
# 停止服务
stop:
@echo "停止所有服务..."
docker-compose down
docker-compose -f docker-compose.simple.yml down
# 重启服务
restart: stop run
# 查看日志
logs:
@echo "查看服务日志..."
docker-compose logs -f bigdatatool
# 查看特定服务日志
logs-app:
docker-compose logs -f bigdatatool
logs-redis:
docker-compose logs -f redis-cache
logs-nginx:
docker-compose logs -f nginx
# 进入容器shell
shell:
@echo "进入BigDataTool容器..."
docker-compose exec bigdatatool /bin/bash
# 健康检查
health:
@echo "检查服务健康状态..."
@docker-compose ps
@echo ""
@echo "应用健康检查:"
@curl -s http://localhost:5000/api/health | python -m json.tool || echo "服务未响应"
# 查看运行状态
ps:
@echo "容器运行状态:"
@docker-compose ps
# 清理资源
clean:
@echo "清理Docker资源..."
docker-compose down -v --remove-orphans
docker-compose -f docker-compose.simple.yml down -v --remove-orphans
docker system prune -f
@echo "清理完成"
# 强制清理(包括镜像)
clean-all: clean
@echo "强制清理所有资源..."
docker rmi bigdatatool:latest || true
docker volume prune -f
docker network prune -f
# 更新镜像
update: clean build run
# 查看资源使用
stats:
@echo "Docker资源使用情况:"
@docker stats --no-stream
# 备份数据
backup:
@echo "备份数据库和配置..."
@mkdir -p backups/$(shell date +%Y%m%d_%H%M%S)
@docker cp bigdatatool:/app/config_groups.db backups/$(shell date +%Y%m%d_%H%M%S)/
@echo "备份完成: backups/$(shell date +%Y%m%d_%H%M%S)/"
# 开发模式运行
dev:
@echo "开发模式运行..."
@docker run --rm -it \
-p 5000:5000 \
-v $(PWD):/app \
-e FLASK_ENV=development \
-e FLASK_DEBUG=True \
bigdatatool:latest
# 构建并推送到仓库(需要先登录Docker Hub)
publish: build
@echo "推送镜像到Docker Hub..."
@read -p "请输入Docker Hub用户名: " username && \
docker tag bigdatatool:latest $$username/bigdatatool:latest && \
docker tag bigdatatool:latest $$username/bigdatatool:2.0 && \
docker push $$username/bigdatatool:latest && \
docker push $$username/bigdatatool:2.0

497
README.md Normal file
View File

@@ -0,0 +1,497 @@
# BigDataTool - 大数据查询比对工具
[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://python.org)
[![Flask Version](https://img.shields.io/badge/flask-2.3.3-green.svg)](https://flask.palletsprojects.com/)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
BigDataTool是一个功能强大的数据库查询比对工具专门用于Cassandra数据库和Redis集群的数据一致性验证。支持单表查询、TWCS分表查询、多主键查询等多种复杂场景。
## 🚀 核心功能
### Cassandra数据比对
- **单表查询**:标准的生产环境与测试环境数据比对
- **分表查询**基于TWCS策略的时间分表查询支持
- **多主键查询**:支持复合主键的精确匹配和比对
- **智能数据比较**JSON、数组等复杂数据类型的深度比较
### Redis数据比对
- **全类型支持**String、Hash、List、Set、ZSet、Stream等所有Redis数据类型
- **集群支持**:单节点和集群模式的自动检测和连接
- **随机采样**支持随机Key采样和指定Key比对两种模式
- **性能监控**:详细的连接时间和查询性能统计
### 配置管理
- **配置组管理**:数据库连接配置的保存、加载和复用
- **查询历史**:查询记录的持久化存储和一键回放
- **实时日志**:详细的操作日志和性能监控
- **YAML导入**支持YAML格式配置的一键导入
## 📋 系统要求
- Python 3.8+
- Flask 2.3.3
- Cassandra Driver 3.29.1
- Redis 5.0.1
## 🛠️ 安装部署
### 快速开始
#### 方式1直接运行推荐开发环境
```bash
# 1. 克隆项目
git clone https://github.com/your-org/BigDataTool.git
cd BigDataTool
# 2. 安装依赖
pip install -r requirements.txt
# 3. 启动应用
python app.py
```
#### 方式2Docker容器化部署推荐生产环境
```bash
# 1. 克隆项目
git clone https://github.com/your-org/BigDataTool.git
cd BigDataTool
# 2. 构建并启动(简化版本)
make build
make run
# 或者使用Docker Compose直接启动
docker-compose -f docker-compose.simple.yml up -d
```
#### 方式3完整Docker环境包含缓存和监控
```bash
# 启动完整服务栈
docker-compose up -d
# 查看服务状态
make ps
```
### 容器化部署详情
#### 🐳 Docker镜像特性
- **基础镜像**: Python 3.9 Alpine轻量级
- **镜像大小**: < 200MB
- **安全性**: 非root用户运行
- **健康检查**: 内置应用健康监控
- **多架构**: 支持AMD64和ARM64
#### 🚀 一键部署命令
```bash
# 查看所有可用命令
make help
# 构建镜像
make build
# 启动服务(简化版本)
make run
# 启动完整服务包含Redis缓存
make run-full
# 启动生产环境包含Nginx反向代理
make run-prod
# 查看服务日志
make logs
# 进入容器调试
make shell
# 健康检查
make health
# 停止服务
make stop
# 清理资源
make clean
```
#### 🔧 环境变量配置
```bash
# 设置应用密钥(生产环境必须设置)
export SECRET_KEY="your-super-secret-key-change-in-production"
# 设置运行环境
export FLASK_ENV=production
export FLASK_DEBUG=False
# 数据库配置
export DATABASE_URL="sqlite:///config_groups.db"
# 安全配置
export FORCE_HTTPS=true
```
#### 📊 服务端点
启动后可访问以下地址
**简化部署**:
- 主应用: http://localhost:5000
**完整部署**:
- 主应用: http://localhost:5000
- Redis缓存: localhost:6379
- Prometheus监控: http://localhost:9090
**生产环境**:
- HTTP: http://localhost
- HTTPS: https://localhost
### 传统部署方式
#### Python虚拟环境部署
```bash
# 创建虚拟环境
python -m venv venv
source venv/bin/activate # Linux/Mac
# 或 venv\Scripts\activate # Windows
# 安装依赖
pip install -r requirements.txt
# 启动应用
python app.py
```
#### 生产环境部署Gunicorn
```bash
# 安装Gunicorn
pip install gunicorn
# 启动生产服务
gunicorn -w 4 -b 0.0.0.0:5000 app:app
```
## 🎯 快速开始
### Cassandra数据比对
1. 访问 `http://localhost:5000/db-compare`
2. 配置生产环境和测试环境的Cassandra连接信息
3. 设置主键字段和查询参数
4. 输入要比对的Key值列表
5. 点击"开始查询"执行比对
#### 单主键查询示例
```
主键字段: id
查询Key值:
1001
1002
1003
```
#### 复合主键查询示例
```
主键字段: docid,id
查询Key值:
8825C293B3609175B2224236E984FEDB,8825C293B3609175B2224236E984FED
9925C293B3609175B2224236E984FEDB,9925C293B3609175B2224236E984FED
```
### Redis数据比对
1. 访问 `http://localhost:5000/redis-compare`
2. 配置两个Redis集群的连接信息
3. 选择查询模式随机采样或指定Key
4. 设置查询参数
5. 执行比对分析
## 🏗️ 系统架构
BigDataTool采用模块化分层架构设计
```
┌─────────────────────────────────────────┐
│ 前端界面层 │
│ (HTML + JavaScript + Bootstrap) │
└─────────────┬───────────────────────────┘
┌─────────────▼───────────────────────────┐
│ API路由层 │
│ (Flask Routes) │
└─────────────┬───────────────────────────┘
┌─────────────▼───────────────────────────┐
│ 业务逻辑层 │
│ ┌─────────────┬─────────────────┐ │
│ │ 查询引擎 │ 比对引擎 │ │
│ │Query Engine │ Comparison │ │
│ └─────────────┴─────────────────┘ │
└─────────────┬───────────────────────────┘
┌─────────────▼───────────────────────────┐
│ 数据访问层 │
│ ┌─────────────┬─────────────────┐ │
│ │ Cassandra │ Redis │ │
│ │ Client │ Client │ │
│ └─────────────┴─────────────────┘ │
└─────────────┬───────────────────────────┘
┌─────────────▼───────────────────────────┐
│ 数据存储层 │
│ ┌──────┬──────┬─────────────────┐ │
│ │SQLite│Cassandra│ Redis │ │
│ │(配置) │ (生产) │ (缓存) │ │
│ └──────┴──────┴─────────────────┘ │
└─────────────────────────────────────────┘
```
### 核心组件
- **查询引擎**: 负责Cassandra和Redis的查询执行
- **比对引擎**: 实现智能数据比对算法
- **配置管理**: SQLite存储的配置持久化
- **日志系统**: 实时查询日志收集和展示
### 数据比对引擎
- **智能JSON比较**自动处理JSON格式差异和嵌套结构
- **数组顺序无关比较**忽略数组元素顺序的深度比较
- **字段级差异分析**详细的字段差异统计和热点分析
- **数据质量评估**自动生成数据一致性报告和改进建议
### 分表查询支持
- **TWCS策略**基于Time Window Compaction Strategy的分表计算
- **时间戳提取**智能从Key中提取时间戳信息
- **混合查询**支持生产分表+测试单表等组合场景
- **并行查询**多分表并行查询以提高性能
### 用户界面
- **响应式设计**基于Bootstrap的现代化界面
- **实时反馈**查询进度和结果的实时显示
- **分页展示**大数据集的高效分页显示
- **多视图模式**原始数据格式化差异对比等多种视图
## 🔧 配置说明
### Cassandra配置
```json
{
"cluster_name": "示例集群",
"hosts": ["127.0.0.1", "127.0.0.2"],
"port": 9042,
"datacenter": "dc1",
"username": "cassandra",
"password": "password",
"keyspace": "example_keyspace",
"table": "example_table"
}
```
### Redis配置
```json
{
"name": "示例Redis",
"nodes": [
{"host": "127.0.0.1", "port": 6379},
{"host": "127.0.0.2", "port": 6379}
],
"password": "redis_password",
"socket_timeout": 3,
"socket_connect_timeout": 3,
"max_connections_per_node": 16
}
```
## 📈 性能指标
### 响应时间
- 单表查询100条记录< 10秒
- 分表查询100条记录< 15秒
- Redis查询100个Key< 10秒
- 页面加载时间< 3秒
### 系统容量
- 最大并发查询数10个
- 单次最大查询记录10,000条
- 支持的数据库连接数无限制
- 内存使用峰值< 1GB
### 数据处理能力
- Cassandra分表自动计算准确率> 95%
- JSON深度比较支持嵌套层级无限制
- Redis全数据类型支持100%
- 查询历史存储容量:无限制
## 🔄 版本更新
### v2.0 (2024-08)
- ✨ 新增Redis集群比对功能
- ✨ 支持多主键复合查询
- ✨ 智能数据类型检测和比对
- 🚀 性能优化和UI改进
- 📚 完整文档体系建设
### v1.0 (2024-07)
- 🎉 基础Cassandra数据比对功能
- 🎉 TWCS分表查询支持
- 🎉 配置管理和查询历史
- 🎉 Web界面和API接口
## 🔍 故障排查
### 常见问题
1. **Cassandra连接失败**
- 检查网络连通性:`telnet <host> <port>`
- 验证认证信息用户名、密码、keyspace
- 确认防火墙设置
2. **Redis连接失败**
- 检查Redis服务状态`redis-cli ping`
- 验证集群配置:节点地址和端口
- 确认密码设置
3. **查询超时**
- 调整连接超时参数
- 检查数据库服务器负载
- 优化查询条件和索引
4. **内存使用过高**
- 减少单次查询的记录数量
- 使用分批查询处理大数据集
- 定期清理查询历史和日志
5. **分表查询失败**
- 检查Key中是否包含有效时间戳
- 确认分表参数配置正确
- 验证目标分表是否存在
## 📝 API文档
### 主要API端点
- `POST /api/query` - 单表查询比对
- `POST /api/sharding-query` - 分表查询比对
- `POST /api/redis/compare` - Redis数据比对
- `GET /api/config-groups` - 获取配置组列表
- `POST /api/config-groups` - 创建配置组
- `GET /api/query-history` - 获取查询历史
- `GET /api/query-logs` - 获取查询日志
## 🤝 贡献指南
我们欢迎所有形式的贡献!请遵循以下步骤:
### 基本流程
1. **Fork项目**
```bash
git clone https://github.com/your-username/BigDataTool.git
cd BigDataTool
```
2. **创建功能分支**
```bash
git checkout -b feature/amazing-feature
```
3. **遵循代码规范**
- 查看 [代码规范](docs/coding-standards.md)
- 使用PEP 8 Python风格
- 添加必要的测试用例
- 更新相关文档
4. **提交更改**
```bash
git commit -m 'feat: Add some AmazingFeature'
```
5. **推送到分支**
```bash
git push origin feature/amazing-feature
```
6. **创建Pull Request**
- 描述变更内容和原因
- 确保所有测试通过
- 添加必要的截图或演示
### 贡献类型
- 🐛 Bug修复
- ✨ 新功能开发
- 📚 文档改进
- 🎨 界面优化
- 🚀 性能优化
- 🔧 配置和工具
### 代码审查
所有贡献都将经过代码审查,包括:
- 功能正确性验证
- 代码质量检查
- 安全性评估
- 文档完整性确认
详细开发指南请参考 [开发者文档](docs/developer-guide.md)
## 🛡️ 安全声明
BigDataTool致力于数据安全
- 🔒 **传输加密**: 支持HTTPS/TLS加密传输
- 🔐 **认证机制**: 预留身份认证和权限控制接口
- 🔍 **输入验证**: 严格的输入参数验证和过滤
- 📝 **审计日志**: 完整的操作日志和安全事件记录
- 🛡️ **数据保护**: 敏感信息不明文存储
如发现安全漏洞请发送邮件至安全团队或创建私密Issue。
详细安全规范请参考 [安全指南](docs/security-guidelines.md)
## 📄 许可证
本项目采用 MIT 许可证 - 查看 [LICENSE](LICENSE) 文件了解详情
## 👥 项目团队
### 核心开发者
- **项目负责人**: BigDataTool项目组
- **架构师**: 系统架构设计团队
- **前端开发**: UI/UX开发团队
- **后端开发**: 数据处理引擎团队
- **测试工程师**: 质量保证团队
### 贡献者统计
感谢所有为项目做出贡献的开发者!
## 📞 支持与反馈
### 问题报告
- 🐛 [Bug报告](https://github.com/your-org/BigDataTool/issues/new?template=bug_report.md)
- ✨ [功能请求](https://github.com/your-org/BigDataTool/issues/new?template=feature_request.md)
- ❓ [问题讨论](https://github.com/your-org/BigDataTool/discussions)
### 社区支持
- 📚 查看 [用户手册](docs/user-manual.md) 获取详细使用说明
- 🔧 查看 [故障排查指南](docs/operations.md) 解决常见问题
- 💬 加入社区讨论组获取实时帮助
## 🙏 致谢
感谢以下开源项目和技术社区的支持:
- **[Flask](https://flask.palletsprojects.com/)** - 轻量级Web框架
- **[Cassandra](https://cassandra.apache.org/)** - 分布式NoSQL数据库
- **[Redis](https://redis.io/)** - 高性能键值存储
- **[Bootstrap](https://getbootstrap.com/)** - 前端UI框架
- **[jQuery](https://jquery.com/)** - JavaScript库
特别感谢所有提供反馈、bug报告和功能建议的用户
---
## 📊 项目状态
**最后更新**: 2024年8月6日
**当前版本**: v2.0
**开发状态**: 持续维护中
> ⚠️ **重要提示**: 本工具主要用于开发测试环境的数据比对,生产环境使用请谨慎评估并做好安全防护。建议在使用前详细阅读 [安全指南](docs/security-guidelines.md)。

1508
app.py

File diff suppressed because it is too large Load Diff

29
docker-compose.simple.yml Normal file
View File

@@ -0,0 +1,29 @@
# BigDataTool 简化版本 Docker Compose 配置
# 适用于快速开发和测试
version: '3.8'
services:
bigdatatool:
build:
context: .
dockerfile: Dockerfile
container_name: bigdatatool
ports:
- "8080:5000"
environment:
- FLASK_ENV=production
- FLASK_DEBUG=False
- FLASK_HOST=0.0.0.0
- FLASK_PORT=5000
# volumes:
# # 持久化数据库
# - ./data:/app/data
# # 持久化日志
# - ./logs:/app/logs
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:5000/api/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s

237
docker-entrypoint.sh Executable file
View File

@@ -0,0 +1,237 @@
#!/bin/bash
# BigDataTool Docker 启动脚本
# 用于容器化部署的入口脚本
set -e
# 颜色定义
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# 日志函数
log_info() {
echo -e "${BLUE}[INFO]${NC} $1"
}
log_warn() {
echo -e "${YELLOW}[WARN]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
log_success() {
echo -e "${GREEN}[SUCCESS]${NC} $1"
}
# 显示启动信息
show_banner() {
echo "======================================"
echo " BigDataTool Container Startup"
echo "======================================"
echo "Version: 2.0"
echo "Python: $(python --version)"
echo "Working Directory: $(pwd)"
echo "User: $(whoami)"
echo "======================================"
}
# 检查环境变量
check_environment() {
log_info "检查环境变量..."
# 设置默认值
export FLASK_ENV=${FLASK_ENV:-production}
export FLASK_DEBUG=${FLASK_DEBUG:-False}
export SECRET_KEY=${SECRET_KEY:-$(python -c "import secrets; print(secrets.token_hex(32))")}
export DATABASE_URL=${DATABASE_URL:-sqlite:///config_groups.db}
log_info "FLASK_ENV: $FLASK_ENV"
log_info "FLASK_DEBUG: $FLASK_DEBUG"
log_info "数据库URL: $DATABASE_URL"
# 检查必要的环境变量
if [ -z "$SECRET_KEY" ]; then
log_warn "SECRET_KEY 未设置,使用随机生成的密钥"
fi
}
# 初始化数据库
initialize_database() {
log_info "初始化数据库..."
# 检查数据库文件是否存在
if [ ! -f "config_groups.db" ]; then
log_info "数据库文件不存在,将自动创建"
python -c "
from modules.database import ensure_database
if ensure_database():
print('数据库初始化成功')
else:
print('数据库初始化失败')
exit(1)
"
if [ $? -eq 0 ]; then
log_success "数据库初始化完成"
else
log_error "数据库初始化失败"
exit 1
fi
else
log_info "数据库文件已存在"
fi
}
# 创建必要目录
create_directories() {
log_info "创建必要目录..."
# 创建日志目录
if [ ! -d "logs" ]; then
mkdir -p logs
log_info "创建日志目录: logs"
fi
# 创建配置目录
if [ ! -d "config" ]; then
mkdir -p config
log_info "创建配置目录: config"
fi
# 设置权限
chmod -R 755 logs config || true
}
# 健康检查函数
health_check() {
log_info "执行健康检查..."
local max_attempts=30
local attempt=1
while [ $attempt -le $max_attempts ]; do
if wget --no-verbose --tries=1 --spider http://localhost:5000/api/health >/dev/null 2>&1; then
log_success "应用健康检查通过"
return 0
fi
log_info "等待应用启动... ($attempt/$max_attempts)"
sleep 2
((attempt++))
done
log_error "健康检查失败,应用可能未正常启动"
return 1
}
# 信号处理函数
cleanup() {
log_warn "收到退出信号,正在清理..."
# 这里可以添加清理逻辑
# 例如:保存缓存、关闭数据库连接等
log_info "清理完成,退出应用"
exit 0
}
# 设置信号处理
trap cleanup SIGTERM SIGINT
# 主启动函数
start_application() {
log_info "启动BigDataTool应用..."
# 根据环境变量选择启动方式
if [ "$FLASK_ENV" = "development" ]; then
log_info "以开发模式启动"
python app.py
else
log_info "以生产模式启动"
# 检查是否安装了gunicorn
if command -v gunicorn >/dev/null 2>&1; then
log_info "使用Gunicorn启动应用"
exec gunicorn \
--bind 0.0.0.0:5000 \
--workers 4 \
--worker-class sync \
--worker-connections 1000 \
--max-requests 1000 \
--max-requests-jitter 50 \
--timeout 120 \
--keep-alive 5 \
--log-level info \
--access-logfile - \
--error-logfile - \
app:app
else
log_warn "Gunicorn未安装使用Flask开发服务器"
python app.py
fi
fi
}
# 显示帮助信息
show_help() {
echo "BigDataTool Docker 启动脚本"
echo ""
echo "用法: $0 [选项]"
echo ""
echo "选项:"
echo " start 启动应用(默认)"
echo " health-check 执行健康检查"
echo " init-db 仅初始化数据库"
echo " shell 进入交互式shell"
echo " help 显示此帮助信息"
echo ""
echo "环境变量:"
echo " FLASK_ENV Flask运行环境 (development/production)"
echo " FLASK_DEBUG 是否启用调试模式 (True/False)"
echo " SECRET_KEY 应用密钥"
echo " DATABASE_URL 数据库连接URL"
echo ""
}
# 主逻辑
main() {
case "${1:-start}" in
start)
show_banner
check_environment
create_directories
initialize_database
start_application
;;
health-check)
health_check
;;
init-db)
log_info "仅初始化数据库模式"
check_environment
create_directories
initialize_database
log_success "数据库初始化完成"
;;
shell)
log_info "进入交互式shell"
exec /bin/bash
;;
help|--help|-h)
show_help
;;
*)
log_error "未知选项: $1"
show_help
exit 1
;;
esac
}
# 执行主函数
main "$@"

17
modules/__init__.py Normal file
View File

@@ -0,0 +1,17 @@
"""
BigDataTool Modules
This directory contains all functional modules for the BigDataTool application.
Module List:
- database.py - Database management
- query_logger.py - Query logging management
- sharding.py - Sharding calculations
- cassandra_client.py - Cassandra connections
- query_engine.py - Data query engine
- data_comparison.py - Data comparison algorithms
- config_manager.py - Configuration management
- api_routes.py - API route definitions
Each module has clear responsibility boundaries and standardized interfaces.
"""

1280
modules/api_routes.py Normal file

File diff suppressed because it is too large Load Diff

196
modules/cassandra_client.py Normal file
View File

@@ -0,0 +1,196 @@
"""
Cassandra连接管理模块
====================
本模块负责Cassandra数据库的连接管理和高级错误诊断功能。
核心功能:
1. 智能连接管理:自动处理集群连接和故障转移
2. 错误诊断系统:详细的连接失败分析和解决建议
3. 性能监控:连接时间和集群状态的实时监控
4. 容错机制:连接超时、重试和优雅降级
5. 安全认证支持用户名密码认证和SSL连接
连接特性:
- 负载均衡使用DCAwareRoundRobinPolicy避免单点故障
- 连接池管理:优化的连接复用和资源管理
- 超时控制:可配置的连接和查询超时时间
- 协议版本使用稳定的CQL协议版本4
- Schema同步自动等待集群Schema一致性
错误诊断系统:
- 连接拒绝:检查服务状态和网络连通性
- 认证失败:验证用户名密码和权限设置
- 超时错误:分析网络延迟和服务器负载
- Keyspace错误验证Keyspace存在性和访问权限
- 未知错误:提供通用的故障排查指南
监控功能:
- 集群状态:实时显示可用和故障节点
- 连接时间:精确的连接建立时间测量
- 元数据获取:集群名称和节点信息展示
- 性能指标:连接成功率和响应时间统计
作者BigDataTool项目组
更新时间2024年8月
"""
import time
import logging
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.policies import DCAwareRoundRobinPolicy
logger = logging.getLogger(__name__)
def create_connection(config):
"""
创建Cassandra数据库连接具备增强的错误诊断和容错机制
本函数提供企业级的Cassandra连接管理包括
- 智能连接建立:自动选择最优连接参数
- 详细错误诊断:针对不同错误类型提供具体解决方案
- 性能监控:记录连接时间和集群状态
- 容错处理:连接失败时的优雅降级
Args:
config (dict): Cassandra连接配置包含以下字段
- hosts (list): Cassandra节点地址列表
- port (int): 连接端口默认9042
- username (str): 认证用户名
- password (str): 认证密码
- keyspace (str): 目标keyspace名称
- datacenter (str): 数据中心名称,默认'dc1'
Returns:
tuple: (cluster, session) 连接对象元组
- cluster: Cassandra集群对象用于管理连接
- session: 数据库会话对象,用于执行查询
- 连接失败时返回 (None, None)
连接配置优化:
- 协议版本使用稳定的协议版本4
- 连接超时15秒连接超时避免长时间等待
- 负载均衡DCAwareRoundRobinPolicy避免跨DC查询
- Schema同步30秒Schema一致性等待时间
- 查询超时30秒默认查询超时时间
错误诊断:
- 连接拒绝:提供服务状态检查建议
- 认证失败:提供用户权限验证指南
- 超时错误:提供网络和性能优化建议
- Keyspace错误提供Keyspace创建和权限指南
使用示例:
config = {
'hosts': ['127.0.0.1'],
'port': 9042,
'username': 'cassandra',
'password': 'password',
'keyspace': 'example_keyspace',
'datacenter': 'dc1'
}
cluster, session = create_connection(config)
if session:
result = session.execute("SELECT * FROM example_table LIMIT 10")
cluster.shutdown()
"""
start_time = time.time()
logger.info(f"=== 开始创建Cassandra连接 ===")
logger.info(f"主机列表: {config.get('hosts', [])}")
logger.info(f"端口: {config.get('port', 9042)}")
logger.info(f"用户名: {config.get('username', 'N/A')}")
logger.info(f"Keyspace: {config.get('keyspace', 'N/A')}")
try:
logger.info("正在创建认证提供者...")
auth_provider = PlainTextAuthProvider(username=config['username'], password=config['password'])
logger.info("正在创建集群连接...")
# 设置负载均衡策略,避免单点故障
load_balancing_policy = DCAwareRoundRobinPolicy(local_dc=config.get('datacenter', 'dc1'))
# 创建连接配置,增加容错参数
cluster = Cluster(
config['hosts'],
port=config['port'],
auth_provider=auth_provider,
load_balancing_policy=load_balancing_policy,
# 增加容错配置
protocol_version=4, # 使用稳定的协议版本
connect_timeout=15, # 连接超时
control_connection_timeout=15, # 控制连接超时
max_schema_agreement_wait=30 # schema同步等待时间
)
logger.info("正在连接到Keyspace...")
session = cluster.connect(config['keyspace'])
# 设置session级别的容错参数
session.default_timeout = 30 # 查询超时时间
connection_time = time.time() - start_time
logger.info(f"✅ Cassandra连接成功: 连接时间={connection_time:.3f}")
# 记录集群状态
try:
cluster_name = cluster.metadata.cluster_name or "Unknown"
logger.info(f" 集群名称: {cluster_name}")
# 记录可用主机状态
live_hosts = [str(host.address) for host in cluster.metadata.all_hosts() if host.is_up]
down_hosts = [str(host.address) for host in cluster.metadata.all_hosts() if not host.is_up]
logger.info(f" 可用节点: {live_hosts} ({len(live_hosts)}个)")
if down_hosts:
logger.warning(f" 故障节点: {down_hosts} ({len(down_hosts)}个)")
except Exception as meta_error:
logger.warning(f"无法获取集群元数据: {meta_error}")
return cluster, session
except Exception as e:
connection_time = time.time() - start_time
error_msg = str(e)
logger.error(f"❌ Cassandra连接失败: 连接时间={connection_time:.3f}")
logger.error(f"错误类型: {type(e).__name__}")
logger.error(f"错误详情: {error_msg}")
# 提供详细的诊断信息
if "connection refused" in error_msg.lower() or "unable to connect" in error_msg.lower():
logger.error("❌ 诊断无法连接到Cassandra服务器")
logger.error("🔧 建议检查:")
logger.error(" 1. Cassandra服务是否启动")
logger.error(" 2. 主机地址和端口是否正确")
logger.error(" 3. 网络防火墙是否阻挡连接")
elif "timeout" in error_msg.lower():
logger.error("❌ 诊断:连接超时")
logger.error("🔧 建议检查:")
logger.error(" 1. 网络延迟是否过高")
logger.error(" 2. Cassandra服务器负载是否过高")
logger.error(" 3. 增加连接超时时间")
elif "authentication" in error_msg.lower() or "unauthorized" in error_msg.lower():
logger.error("❌ 诊断:认证失败")
logger.error("🔧 建议检查:")
logger.error(" 1. 用户名和密码是否正确")
logger.error(" 2. 用户是否有访问该keyspace的权限")
elif "keyspace" in error_msg.lower():
logger.error("❌ 诊断Keyspace不存在")
logger.error("🔧 建议检查:")
logger.error(" 1. Keyspace名称是否正确")
logger.error(" 2. Keyspace是否已创建")
else:
logger.error("❌ 诊断:未知连接错误")
logger.error("🔧 建议:")
logger.error(" 1. 检查所有连接参数")
logger.error(" 2. 查看Cassandra服务器日志")
logger.error(" 3. 测试网络连通性")
return None, None

823
modules/config_manager.py Normal file
View File

@@ -0,0 +1,823 @@
"""
配置管理模块
============
本模块负责BigDataTool项目的配置管理和查询历史管理提供完整的CRUD操作。
核心功能:
1. Cassandra配置组管理数据库连接配置的保存、加载、删除
2. Redis配置组管理Redis集群配置的完整生命周期管理
3. 查询历史管理:查询记录的持久化存储和检索
4. 配置解析和验证YAML格式配置的智能解析
支持的配置类型:
- Cassandra配置集群地址、认证信息、keyspace等
- Redis配置集群节点、连接参数、查询选项等
- 查询配置:主键字段、比较字段、排除字段等
- 分表配置TWCS分表参数、时间间隔、表数量等
数据存储格式:
- 所有配置以JSON格式存储在SQLite数据库中
- 支持复杂嵌套结构和数组类型
- 自动处理序列化和反序列化
- 保持数据类型完整性
设计特点:
- 类型安全:完整的参数验证和类型检查
- 事务安全:数据库操作的原子性保证
- 错误恢复:数据库异常时的优雅降级
- 向后兼容:支持旧版本配置格式的自动升级
作者BigDataTool项目组
更新时间2024年8月
"""
import json
import logging
from datetime import datetime
from .database import ensure_database, get_db_connection
logger = logging.getLogger(__name__)
def convert_bytes_to_str(obj):
"""递归转换对象中的bytes类型为字符串用于JSON序列化
Args:
obj: 需要转换的对象可以是dict, list或其他类型
Returns:
转换后的对象所有bytes类型都被转换为hex字符串
"""
if isinstance(obj, bytes):
# 将bytes转换为十六进制字符串
return obj.hex()
elif isinstance(obj, dict):
# 递归处理字典
return {key: convert_bytes_to_str(value) for key, value in obj.items()}
elif isinstance(obj, list):
# 递归处理列表
return [convert_bytes_to_str(item) for item in obj]
elif isinstance(obj, tuple):
# 递归处理元组
return tuple(convert_bytes_to_str(item) for item in obj)
else:
# 其他类型直接返回
return obj
# Cassandra数据库默认配置模板
# 注意此配置不包含敏感信息仅作为UI表单的初始模板使用
DEFAULT_CONFIG = {
'pro_config': {
'cluster_name': '',
'hosts': [],
'port': 9042,
'datacenter': '',
'username': '',
'password': '',
'keyspace': '',
'table': ''
},
'test_config': {
'cluster_name': '',
'hosts': [],
'port': 9042,
'datacenter': '',
'username': '',
'password': '',
'keyspace': '',
'table': ''
},
'keys': [],
'fields_to_compare': [],
'exclude_fields': []
}
# Redis集群默认配置模板
# 支持单节点和集群模式,自动检测连接类型
REDIS_DEFAULT_CONFIG = {
'cluster1_config': {
'name': '生产集群',
'nodes': [
{'host': '127.0.0.1', 'port': 7000}
],
'password': '',
'socket_timeout': 3,
'socket_connect_timeout': 3,
'max_connections_per_node': 16
},
'cluster2_config': {
'name': '测试集群',
'nodes': [
{'host': '127.0.0.1', 'port': 7001}
],
'password': '',
'socket_timeout': 3,
'socket_connect_timeout': 3,
'max_connections_per_node': 16
},
'query_options': {
'mode': 'random',
'count': 100,
'pattern': '*',
'source_cluster': 'cluster2',
'keys': []
}
}
def save_redis_config_group(name, description, cluster1_config, cluster2_config, query_options):
"""保存Redis配置组"""
if not ensure_database():
logger.error("数据库初始化失败")
return False
conn = get_db_connection()
cursor = conn.cursor()
try:
cursor.execute('''
INSERT OR REPLACE INTO redis_config_groups
(name, description, cluster1_config, cluster2_config, query_options, updated_at)
VALUES (?, ?, ?, ?, ?, ?)
''', (
name, description,
json.dumps(cluster1_config),
json.dumps(cluster2_config),
json.dumps(query_options),
datetime.now().isoformat()
))
conn.commit()
logger.info(f"Redis配置组 '{name}' 保存成功")
return True
except Exception as e:
logger.error(f"保存Redis配置组失败: {e}")
return False
finally:
conn.close()
def get_redis_config_groups():
"""获取所有Redis配置组"""
if not ensure_database():
logger.error("数据库初始化失败")
return []
conn = get_db_connection()
cursor = conn.cursor()
try:
cursor.execute('''
SELECT id, name, description, created_at, updated_at
FROM redis_config_groups
ORDER BY updated_at DESC
''')
rows = cursor.fetchall()
config_groups = []
for row in rows:
config_groups.append({
'id': row['id'],
'name': row['name'],
'description': row['description'],
'created_at': row['created_at'],
'updated_at': row['updated_at']
})
return config_groups
except Exception as e:
logger.error(f"获取Redis配置组失败: {e}")
return []
finally:
conn.close()
def get_redis_config_group_by_id(group_id):
"""根据ID获取Redis配置组详情"""
if not ensure_database():
logger.error("数据库初始化失败")
return None
conn = get_db_connection()
cursor = conn.cursor()
try:
cursor.execute('''
SELECT id, name, description, cluster1_config, cluster2_config, query_options,
created_at, updated_at
FROM redis_config_groups WHERE id = ?
''', (group_id,))
row = cursor.fetchone()
if row:
config = {
'id': row['id'],
'name': row['name'],
'description': row['description'],
'cluster1_config': json.loads(row['cluster1_config']),
'cluster2_config': json.loads(row['cluster2_config']),
'query_options': json.loads(row['query_options']),
'created_at': row['created_at'],
'updated_at': row['updated_at']
}
return config
return None
except Exception as e:
logger.error(f"获取Redis配置组详情失败: {e}")
return None
finally:
conn.close()
def delete_redis_config_group(group_id):
"""删除Redis配置组"""
if not ensure_database():
logger.error("数据库初始化失败")
return False
conn = get_db_connection()
cursor = conn.cursor()
try:
cursor.execute('DELETE FROM redis_config_groups WHERE id = ?', (group_id,))
conn.commit()
success = cursor.rowcount > 0
if success:
logger.info(f"Redis配置组ID {group_id} 删除成功")
return success
except Exception as e:
logger.error(f"删除Redis配置组失败: {e}")
return False
finally:
conn.close()
def save_redis_query_history(name, description, cluster1_config, cluster2_config, query_options,
query_keys, results_summary, execution_time, total_keys,
different_count, identical_count, missing_count, raw_results=None):
"""保存Redis查询历史记录返回历史记录ID"""
if not ensure_database():
logger.error("数据库初始化失败")
return None
conn = get_db_connection()
cursor = conn.cursor()
try:
# 转换可能包含bytes类型的数据
raw_results = convert_bytes_to_str(raw_results) if raw_results else None
cursor.execute('''
INSERT INTO redis_query_history
(name, description, cluster1_config, cluster2_config, query_options, query_keys,
results_summary, execution_time, total_keys, different_count, identical_count,
missing_count, raw_results)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
''', (
name, description,
json.dumps(cluster1_config),
json.dumps(cluster2_config),
json.dumps(query_options),
json.dumps(query_keys),
json.dumps(results_summary),
execution_time,
total_keys,
different_count,
identical_count,
missing_count,
json.dumps(raw_results) if raw_results else None
))
# 获取插入记录的ID
history_id = cursor.lastrowid
conn.commit()
logger.info(f"Redis查询历史记录 '{name}' 保存成功ID{history_id}")
return history_id
except Exception as e:
logger.error(f"保存Redis查询历史记录失败: {e}")
return None
finally:
conn.close()
def get_redis_query_history():
"""获取Redis查询历史记录"""
if not ensure_database():
logger.error("数据库初始化失败")
return []
conn = get_db_connection()
cursor = conn.cursor()
try:
cursor.execute('''
SELECT id, name, description, execution_time, total_keys,
different_count, identical_count, missing_count, created_at
FROM redis_query_history
ORDER BY created_at DESC
''')
rows = cursor.fetchall()
history_list = []
for row in rows:
history_list.append({
'id': row['id'],
'name': row['name'],
'description': row['description'],
'execution_time': row['execution_time'],
'total_keys': row['total_keys'],
'different_count': row['different_count'],
'identical_count': row['identical_count'],
'missing_count': row['missing_count'],
'created_at': row['created_at']
})
return history_list
except Exception as e:
logger.error(f"获取Redis查询历史记录失败: {e}")
return []
finally:
conn.close()
def get_redis_query_history_by_id(history_id):
"""根据ID获取Redis查询历史记录详情"""
if not ensure_database():
logger.error("数据库初始化失败")
return None
conn = get_db_connection()
cursor = conn.cursor()
try:
cursor.execute('''
SELECT * FROM redis_query_history WHERE id = ?
''', (history_id,))
row = cursor.fetchone()
if row:
return {
'id': row['id'],
'name': row['name'],
'description': row['description'],
'cluster1_config': json.loads(row['cluster1_config']),
'cluster2_config': json.loads(row['cluster2_config']),
'query_options': json.loads(row['query_options']),
'query_keys': json.loads(row['query_keys']),
'results_summary': json.loads(row['results_summary']),
'execution_time': row['execution_time'],
'total_keys': row['total_keys'],
'different_count': row['different_count'],
'identical_count': row['identical_count'],
'missing_count': row['missing_count'],
'created_at': row['created_at'],
'raw_results': json.loads(row['raw_results']) if row['raw_results'] else None
}
return None
except Exception as e:
logger.error(f"获取Redis查询历史记录详情失败: {e}")
return None
finally:
conn.close()
def delete_redis_query_history(history_id):
"""删除Redis查询历史记录"""
if not ensure_database():
logger.error("数据库初始化失败")
return False
conn = get_db_connection()
cursor = conn.cursor()
try:
cursor.execute('DELETE FROM redis_query_history WHERE id = ?', (history_id,))
conn.commit()
success = cursor.rowcount > 0
if success:
logger.info(f"Redis查询历史记录ID {history_id} 删除成功")
return success
except Exception as e:
logger.error(f"删除Redis查询历史记录失败: {e}")
return False
finally:
conn.close()
def batch_delete_redis_query_history(history_ids):
"""批量删除Redis查询历史记录"""
if not history_ids:
return {'success': True, 'message': '没有要删除的记录', 'deleted_count': 0}
if not ensure_database():
logger.error("数据库初始化失败")
return {'success': False, 'error': '数据库初始化失败', 'deleted_count': 0}
conn = get_db_connection()
cursor = conn.cursor()
try:
# 构建IN子句的占位符
placeholders = ','.join(['?' for _ in history_ids])
sql = f'DELETE FROM redis_query_history WHERE id IN ({placeholders})'
cursor.execute(sql, history_ids)
conn.commit()
deleted_count = cursor.rowcount
if deleted_count > 0:
logger.info(f"成功批量删除 {deleted_count} 条Redis查询历史记录: {history_ids}")
return {
'success': True,
'message': f'成功删除 {deleted_count} 条记录',
'deleted_count': deleted_count
}
else:
return {
'success': False,
'error': '没有找到要删除的记录',
'deleted_count': 0
}
except Exception as e:
logger.error(f"批量删除Redis查询历史记录失败: {e}")
return {
'success': False,
'error': f'删除失败: {str(e)}',
'deleted_count': 0
}
finally:
conn.close()
def parse_redis_config_from_yaml(yaml_text):
"""从YAML格式文本解析Redis配置"""
try:
config = {}
lines = yaml_text.strip().split('\n')
for line in lines:
line = line.strip()
if ':' in line:
key, value = line.split(':', 1)
key = key.strip()
value = value.strip()
# 移除引号
if value.startswith('"') and value.endswith('"'):
value = value[1:-1]
elif value.startswith("'") and value.endswith("'"):
value = value[1:-1]
config[key] = value
# 转换为Redis集群配置格式
redis_config = {
'name': config.get('clusterName', ''),
'nodes': [],
'password': config.get('clusterPassword', ''),
'socket_timeout': 3,
'socket_connect_timeout': 3,
'max_connections_per_node': 16
}
# 解析地址
cluster_address = config.get('clusterAddress', '')
if cluster_address:
if ':' in cluster_address:
host, port = cluster_address.split(':', 1)
redis_config['nodes'] = [{'host': host, 'port': int(port)}]
else:
redis_config['nodes'] = [{'host': cluster_address, 'port': 6379}]
return redis_config
except Exception as e:
logger.error(f"解析Redis配置失败: {e}")
return None
def save_config_group(name, description, pro_config, test_config, query_config, sharding_config=None):
"""保存配置组"""
if not ensure_database():
logger.error("数据库初始化失败")
return False
conn = get_db_connection()
cursor = conn.cursor()
try:
cursor.execute('''
INSERT OR REPLACE INTO config_groups
(name, description, pro_config, test_config, query_config, sharding_config, updated_at)
VALUES (?, ?, ?, ?, ?, ?, ?)
''', (
name, description,
json.dumps(pro_config),
json.dumps(test_config),
json.dumps(query_config),
json.dumps(sharding_config) if sharding_config else None,
datetime.now().isoformat()
))
conn.commit()
logger.info(f"配置组 '{name}' 保存成功,包含分表配置: {sharding_config is not None}")
return True
except Exception as e:
logger.error(f"保存配置组失败: {e}")
return False
finally:
conn.close()
def get_config_groups():
"""获取所有配置组"""
if not ensure_database():
logger.error("数据库初始化失败")
return []
conn = get_db_connection()
cursor = conn.cursor()
try:
cursor.execute('''
SELECT id, name, description, created_at, updated_at
FROM config_groups
ORDER BY updated_at DESC
''')
rows = cursor.fetchall()
config_groups = []
for row in rows:
config_groups.append({
'id': row['id'],
'name': row['name'],
'description': row['description'],
'created_at': row['created_at'],
'updated_at': row['updated_at']
})
return config_groups
except Exception as e:
logger.error(f"获取配置组失败: {e}")
return []
finally:
conn.close()
def get_config_group_by_id(group_id):
"""根据ID获取配置组详情"""
if not ensure_database():
logger.error("数据库初始化失败")
return None
conn = get_db_connection()
cursor = conn.cursor()
try:
cursor.execute('''
SELECT id, name, description, pro_config, test_config, query_config,
sharding_config, created_at, updated_at
FROM config_groups WHERE id = ?
''', (group_id,))
row = cursor.fetchone()
if row:
config = {
'id': row['id'],
'name': row['name'],
'description': row['description'],
'pro_config': json.loads(row['pro_config']),
'test_config': json.loads(row['test_config']),
'query_config': json.loads(row['query_config']),
'created_at': row['created_at'],
'updated_at': row['updated_at']
}
# 添加分表配置
if row['sharding_config']:
try:
config['sharding_config'] = json.loads(row['sharding_config'])
except (json.JSONDecodeError, TypeError):
config['sharding_config'] = None
else:
config['sharding_config'] = None
return config
return None
except Exception as e:
logger.error(f"获取配置组详情失败: {e}")
return None
finally:
conn.close()
def delete_config_group(group_id):
"""删除配置组"""
if not ensure_database():
logger.error("数据库初始化失败")
return False
conn = get_db_connection()
cursor = conn.cursor()
try:
cursor.execute('DELETE FROM config_groups WHERE id = ?', (group_id,))
conn.commit()
success = cursor.rowcount > 0
if success:
logger.info(f"配置组ID {group_id} 删除成功")
return success
except Exception as e:
logger.error(f"删除配置组失败: {e}")
return False
finally:
conn.close()
def save_query_history(name, description, pro_config, test_config, query_config, query_keys,
results_summary, execution_time, total_keys, differences_count, identical_count,
sharding_config=None, query_type='single', raw_results=None, differences_data=None, identical_data=None):
"""保存查询历史记录支持分表查询和查询结果数据返回历史记录ID"""
if not ensure_database():
logger.error("数据库初始化失败")
return None
conn = get_db_connection()
cursor = conn.cursor()
try:
# 转换可能包含bytes类型的数据
raw_results = convert_bytes_to_str(raw_results) if raw_results else None
differences_data = convert_bytes_to_str(differences_data) if differences_data else None
identical_data = convert_bytes_to_str(identical_data) if identical_data else None
cursor.execute('''
INSERT INTO query_history
(name, description, pro_config, test_config, query_config, query_keys,
results_summary, execution_time, total_keys, differences_count, identical_count,
sharding_config, query_type, raw_results, differences_data, identical_data)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
''', (
name, description,
json.dumps(pro_config),
json.dumps(test_config),
json.dumps(query_config),
json.dumps(query_keys),
json.dumps(results_summary),
execution_time,
total_keys,
differences_count,
identical_count,
json.dumps(sharding_config) if sharding_config else None,
query_type,
json.dumps(raw_results) if raw_results else None,
json.dumps(differences_data) if differences_data else None,
json.dumps(identical_data) if identical_data else None
))
# 获取插入记录的ID
history_id = cursor.lastrowid
conn.commit()
logger.info(f"查询历史记录 '{name}' 保存成功,查询类型:{query_type}ID{history_id}")
return history_id
except Exception as e:
logger.error(f"保存查询历史记录失败: {e}")
return None
finally:
conn.close()
def get_query_history():
"""获取所有查询历史记录"""
if not ensure_database():
logger.error("数据库初始化失败")
return []
conn = get_db_connection()
cursor = conn.cursor()
try:
cursor.execute('''
SELECT id, name, description, execution_time, total_keys,
differences_count, identical_count, created_at, query_type
FROM query_history
ORDER BY created_at DESC
''')
rows = cursor.fetchall()
history_list = []
for row in rows:
# 获取列名列表以检查字段是否存在
column_names = [desc[0] for desc in cursor.description]
history_list.append({
'id': row['id'],
'name': row['name'],
'description': row['description'],
'execution_time': row['execution_time'],
'total_keys': row['total_keys'],
'differences_count': row['differences_count'],
'identical_count': row['identical_count'],
'created_at': row['created_at'],
'query_type': row['query_type'] if 'query_type' in column_names else 'single'
})
return history_list
except Exception as e:
logger.error(f"获取查询历史记录失败: {e}")
return []
finally:
conn.close()
def get_query_history_by_id(history_id):
"""根据ID获取查询历史记录详情"""
if not ensure_database():
logger.error("数据库初始化失败")
return None
conn = get_db_connection()
cursor = conn.cursor()
try:
cursor.execute('''
SELECT * FROM query_history WHERE id = ?
''', (history_id,))
row = cursor.fetchone()
if row:
# 获取列名列表以检查字段是否存在
column_names = [desc[0] for desc in cursor.description]
return {
'id': row['id'],
'name': row['name'],
'description': row['description'],
'pro_config': json.loads(row['pro_config']),
'test_config': json.loads(row['test_config']),
'query_config': json.loads(row['query_config']),
'query_keys': json.loads(row['query_keys']),
'results_summary': json.loads(row['results_summary']),
'execution_time': row['execution_time'],
'total_keys': row['total_keys'],
'differences_count': row['differences_count'],
'identical_count': row['identical_count'],
'created_at': row['created_at'],
# 处理新字段,保持向后兼容
'sharding_config': json.loads(row['sharding_config']) if 'sharding_config' in column_names and row['sharding_config'] else None,
'query_type': row['query_type'] if 'query_type' in column_names else 'single',
# 添加查询结果数据支持
'raw_results': json.loads(row['raw_results']) if 'raw_results' in column_names and row['raw_results'] else None,
'differences_data': json.loads(row['differences_data']) if 'differences_data' in column_names and row['differences_data'] else None,
'identical_data': json.loads(row['identical_data']) if 'identical_data' in column_names and row['identical_data'] else None
}
return None
except Exception as e:
logger.error(f"获取查询历史记录详情失败: {e}")
return None
finally:
conn.close()
def delete_query_history(history_id):
"""删除查询历史记录"""
if not ensure_database():
logger.error("数据库初始化失败")
return False
conn = get_db_connection()
cursor = conn.cursor()
try:
cursor.execute('DELETE FROM query_history WHERE id = ?', (history_id,))
conn.commit()
success = cursor.rowcount > 0
if success:
logger.info(f"查询历史记录ID {history_id} 删除成功")
return success
except Exception as e:
logger.error(f"删除查询历史记录失败: {e}")
return False
finally:
conn.close()
def batch_delete_query_history(history_ids):
"""批量删除Cassandra查询历史记录"""
if not history_ids:
return {'success': True, 'message': '没有要删除的记录', 'deleted_count': 0}
if not ensure_database():
logger.error("数据库初始化失败")
return {'success': False, 'error': '数据库初始化失败', 'deleted_count': 0}
conn = get_db_connection()
cursor = conn.cursor()
try:
# 构建IN子句的占位符
placeholders = ','.join(['?' for _ in history_ids])
sql = f'DELETE FROM query_history WHERE id IN ({placeholders})'
cursor.execute(sql, history_ids)
conn.commit()
deleted_count = cursor.rowcount
if deleted_count > 0:
logger.info(f"成功批量删除 {deleted_count} 条Cassandra查询历史记录: {history_ids}")
return {
'success': True,
'message': f'成功删除 {deleted_count} 条记录',
'deleted_count': deleted_count
}
else:
return {
'success': False,
'error': '没有找到要删除的记录',
'deleted_count': 0
}
except Exception as e:
logger.error(f"批量删除Cassandra查询历史记录失败: {e}")
return {
'success': False,
'error': f'删除失败: {str(e)}',
'deleted_count': 0
}
finally:
conn.close()

447
modules/data_comparison.py Normal file
View File

@@ -0,0 +1,447 @@
"""
数据比较引擎模块
================
本模块是BigDataTool的智能数据比较引擎提供高级的数据差异分析功能。
核心功能:
1. 数据集比较:生产环境与测试环境数据的精确比对
2. JSON智能比较支持复杂JSON结构的深度比较
3. 数组顺序无关比较:数组元素的智能匹配算法
4. 复合主键支持:多字段主键的精确匹配
5. 差异分析:详细的字段级差异统计和分析
6. 数据质量评估:自动生成数据一致性报告
比较算法特性:
- JSON标准化自动处理JSON格式差异空格、顺序等
- 数组智能比较:忽略数组元素顺序的深度比较
- 类型容错:自动处理字符串与数字的类型差异
- 编码处理完善的UTF-8和二进制数据处理
- 性能优化:大数据集的高效比较算法
支持的数据类型:
- 基础类型字符串、数字、布尔值、null
- JSON对象嵌套对象的递归比较
- JSON数组元素级别的智能匹配
- 二进制数据:字节级别的精确比较
- 复合主键:多字段组合的精确匹配
输出格式:
- 差异记录:详细的字段级差异信息
- 统计报告:数据一致性的量化分析
- 质量评估:数据质量等级和改进建议
- 性能指标:比较过程的性能统计
作者BigDataTool项目组
更新时间2024年8月
"""
import json
import logging
logger = logging.getLogger(__name__)
def compare_results(pro_data, test_data, keys, fields_to_compare, exclude_fields, values):
"""比较查询结果,支持复合主键"""
differences = []
field_diff_count = {}
identical_results = [] # 存储相同的结果
def match_composite_key(row, composite_value, keys):
"""检查数据行是否匹配复合主键值"""
if len(keys) == 1:
# 单主键匹配
return getattr(row, keys[0]) == composite_value
else:
# 复合主键匹配
if isinstance(composite_value, str) and ',' in composite_value:
key_values = [v.strip() for v in composite_value.split(',')]
if len(key_values) == len(keys):
return all(str(getattr(row, key)) == key_val for key, key_val in zip(keys, key_values))
# 如果不是复合值,只匹配第一个主键
return getattr(row, keys[0]) == composite_value
for value in values:
# 查找生产表和测试表中该主键值的相关数据
rows_pro = [row for row in pro_data if match_composite_key(row, value, keys)]
rows_test = [row for row in test_data if match_composite_key(row, value, keys)]
for row_pro in rows_pro:
# 在测试表中查找相同主键的行
row_test = next(
(row for row in rows_test if all(getattr(row, key) == getattr(row_pro, key) for key in keys)),
None
)
if row_test:
# 确定要比较的列
columns = fields_to_compare if fields_to_compare else row_pro._fields
columns = [col for col in columns if col not in exclude_fields]
has_difference = False
row_differences = []
identical_fields = {}
for column in columns:
value_pro = getattr(row_pro, column)
value_test = getattr(row_test, column)
# 使用智能比较函数,传递字段名用于标签字段判断
if not compare_values(value_pro, value_test, column):
has_difference = True
# 格式化显示值
formatted_pro_value = format_json_for_display(value_pro)
formatted_test_value = format_json_for_display(value_test)
row_differences.append({
'key': {key: getattr(row_pro, key) for key in keys},
'field': column,
'pro_value': formatted_pro_value,
'test_value': formatted_test_value,
'is_json': is_json_field(value_pro) or is_json_field(value_test),
'is_array': is_json_array_field(value_pro) or is_json_array_field(value_test),
'is_tag': is_tag_field(column, value_pro) or is_tag_field(column, value_test)
})
# 统计字段差异次数
field_diff_count[column] = field_diff_count.get(column, 0) + 1
else:
# 存储相同的字段值
identical_fields[column] = format_json_for_display(value_pro)
if has_difference:
differences.extend(row_differences)
else:
# 如果没有差异,存储到相同结果中
identical_results.append({
'key': {key: getattr(row_pro, key) for key in keys},
'pro_fields': identical_fields,
'test_fields': {col: format_json_for_display(getattr(row_test, col)) for col in columns}
})
else:
# 在测试表中未找到对应行
differences.append({
'key': {key: getattr(row_pro, key) for key in keys},
'message': '在测试表中未找到该行'
})
# 检查测试表中是否有生产表中不存在的行
for row_test in rows_test:
row_pro = next(
(row for row in rows_pro if all(getattr(row, key) == getattr(row_test, key) for key in keys)),
None
)
if not row_pro:
differences.append({
'key': {key: getattr(row_test, key) for key in keys},
'message': '在生产表中未找到该行'
})
return differences, field_diff_count, identical_results
def normalize_json_string(value):
"""标准化JSON字符串用于比较"""
if not isinstance(value, str):
return value
try:
# 尝试解析JSON
json_obj = json.loads(value)
# 如果是数组,需要进行特殊处理
if isinstance(json_obj, list):
# 尝试对数组元素进行标准化排序
normalized_array = normalize_json_array(json_obj)
return json.dumps(normalized_array, sort_keys=True, separators=(',', ':'))
else:
# 普通对象,直接序列化
return json.dumps(json_obj, sort_keys=True, separators=(',', ':'))
except (json.JSONDecodeError, TypeError):
# 如果不是JSON返回原值
return value
def normalize_json_array(json_array):
"""标准化JSON数组处理元素顺序问题"""
try:
normalized_elements = []
for element in json_array:
if isinstance(element, dict):
# 对字典元素进行标准化
normalized_elements.append(json.dumps(element, sort_keys=True, separators=(',', ':')))
elif isinstance(element, str):
# 如果是字符串尝试解析为JSON
try:
parsed_element = json.loads(element)
normalized_elements.append(json.dumps(parsed_element, sort_keys=True, separators=(',', ':')))
except:
normalized_elements.append(element)
else:
normalized_elements.append(element)
# 对标准化后的元素进行排序,确保顺序一致
normalized_elements.sort()
# 重新解析为对象数组
result_array = []
for element in normalized_elements:
if isinstance(element, str):
try:
result_array.append(json.loads(element))
except:
result_array.append(element)
else:
result_array.append(element)
return result_array
except Exception as e:
logger.warning(f"数组标准化失败: {e}")
return json_array
def is_json_array_field(value):
"""检查字段是否为JSON数组格式"""
if not isinstance(value, (str, list)):
return False
try:
if isinstance(value, str):
parsed = json.loads(value)
return isinstance(parsed, list)
elif isinstance(value, list):
# 检查是否为JSON字符串数组
if len(value) > 0 and isinstance(value[0], str):
try:
json.loads(value[0])
return True
except:
return False
return True
except:
return False
def compare_array_values(value1, value2):
"""专门用于比较数组类型的值"""
try:
# 处理字符串表示的数组
if isinstance(value1, str) and isinstance(value2, str):
try:
array1 = json.loads(value1)
array2 = json.loads(value2)
if isinstance(array1, list) and isinstance(array2, list):
return compare_json_arrays(array1, array2)
except:
pass
# 处理Python列表类型
elif isinstance(value1, list) and isinstance(value2, list):
return compare_json_arrays(value1, value2)
# 处理混合情况:一个是字符串数组,一个是列表
elif isinstance(value1, list) and isinstance(value2, str):
try:
array2 = json.loads(value2)
if isinstance(array2, list):
return compare_json_arrays(value1, array2)
except:
pass
elif isinstance(value1, str) and isinstance(value2, list):
try:
array1 = json.loads(value1)
if isinstance(array1, list):
return compare_json_arrays(array1, value2)
except:
pass
return False
except Exception as e:
logger.warning(f"数组比较失败: {e}")
return False
def compare_json_arrays(array1, array2):
"""比较两个JSON数组忽略元素顺序"""
try:
if len(array1) != len(array2):
return False
# 标准化两个数组
normalized_array1 = normalize_json_array(array1.copy())
normalized_array2 = normalize_json_array(array2.copy())
# 将标准化后的数组转换为可比较的格式
comparable1 = json.dumps(normalized_array1, sort_keys=True)
comparable2 = json.dumps(normalized_array2, sort_keys=True)
return comparable1 == comparable2
except Exception as e:
logger.warning(f"JSON数组比较失败: {e}")
return False
def format_json_for_display(value):
"""格式化JSON用于显示"""
# 处理None值
if value is None:
return "null"
# 处理非字符串类型
if not isinstance(value, str):
return str(value)
try:
# 尝试解析JSON
json_obj = json.loads(value)
# 格式化显示(带缩进)
return json.dumps(json_obj, sort_keys=True, indent=2, ensure_ascii=False)
except (json.JSONDecodeError, TypeError):
# 如果不是JSON返回原值
return str(value)
def is_json_field(value):
"""检查字段是否为JSON格式"""
if not isinstance(value, str):
return False
try:
json.loads(value)
return True
except (json.JSONDecodeError, TypeError):
return False
def is_tag_field(field_name, value):
"""判断是否为标签类字段(空格分隔的标签列表)
标签字段特征:
1. 字段名包含 'tag' 关键字
2. 值是字符串类型
3. 包含空格分隔的多个元素
"""
if not isinstance(value, str):
return False
# 检查字段名是否包含tag
if field_name and 'tag' in field_name.lower():
# 检查是否包含空格分隔的多个元素
elements = value.strip().split()
if len(elements) > 1:
return True
return False
def compare_tag_values(value1, value2):
"""比较标签类字段的值(忽略顺序)
将空格分隔的标签字符串拆分成集合进行比较
"""
if not isinstance(value1, str) or not isinstance(value2, str):
return value1 == value2
# 将标签字符串拆分成集合
tags1 = set(value1.strip().split())
tags2 = set(value2.strip().split())
# 比较集合是否相等(忽略顺序)
return tags1 == tags2
def compare_values(value1, value2, field_name=None):
"""智能比较两个值支持JSON标准化、数组比较和标签比较
Args:
value1: 第一个值
value2: 第二个值
field_name: 字段名(可选,用于判断是否为标签字段)
"""
# 检查是否为标签字段
if field_name and (is_tag_field(field_name, value1) or is_tag_field(field_name, value2)):
return compare_tag_values(value1, value2)
# 检查是否为数组类型
if is_json_array_field(value1) or is_json_array_field(value2):
return compare_array_values(value1, value2)
# 如果两个值都是字符串尝试JSON标准化比较
if isinstance(value1, str) and isinstance(value2, str):
normalized_value1 = normalize_json_string(value1)
normalized_value2 = normalize_json_string(value2)
return normalized_value1 == normalized_value2
# 其他情况直接比较
return value1 == value2
def generate_comparison_summary(total_keys, pro_count, test_count, differences, identical_results, field_diff_count):
"""生成比较总结报告"""
# 计算基本统计
different_records = len(set([list(diff['key'].values())[0] for diff in differences if 'field' in diff]))
identical_records = len(identical_results)
missing_in_test = len([diff for diff in differences if diff.get('message') == '在测试表中未找到该行'])
missing_in_pro = len([diff for diff in differences if diff.get('message') == '在生产表中未找到该行'])
# 计算百分比
def safe_percentage(part, total):
return round((part / total * 100), 2) if total > 0 else 0
identical_percentage = safe_percentage(identical_records, total_keys)
different_percentage = safe_percentage(different_records, total_keys)
# 生成总结
summary = {
'overview': {
'total_keys_queried': total_keys,
'pro_records_found': pro_count,
'test_records_found': test_count,
'identical_records': identical_records,
'different_records': different_records,
'missing_in_test': missing_in_test,
'missing_in_pro': missing_in_pro
},
'percentages': {
'data_consistency': identical_percentage,
'data_differences': different_percentage,
'missing_rate': safe_percentage(missing_in_test + missing_in_pro, total_keys)
},
'field_analysis': {
'total_fields_compared': len(field_diff_count) if field_diff_count else 0,
'most_different_fields': sorted(field_diff_count.items(), key=lambda x: x[1], reverse=True)[:5] if field_diff_count else []
},
'data_quality': {
'completeness': safe_percentage(pro_count + test_count, total_keys * 2),
'consistency_score': identical_percentage,
'quality_level': get_quality_level(identical_percentage)
},
'recommendations': generate_recommendations(identical_percentage, missing_in_test, missing_in_pro, field_diff_count)
}
return summary
def get_quality_level(consistency_percentage):
"""根据一致性百分比获取数据质量等级"""
if consistency_percentage >= 95:
return {'level': '优秀', 'color': 'success', 'description': '数据一致性非常高'}
elif consistency_percentage >= 90:
return {'level': '良好', 'color': 'info', 'description': '数据一致性较高'}
elif consistency_percentage >= 80:
return {'level': '一般', 'color': 'warning', 'description': '数据一致性中等,需要关注'}
else:
return {'level': '较差', 'color': 'danger', 'description': '数据一致性较低,需要重点处理'}
def generate_recommendations(consistency_percentage, missing_in_test, missing_in_pro, field_diff_count):
"""生成改进建议"""
recommendations = []
if consistency_percentage < 90:
recommendations.append('建议重点关注数据一致性问题,检查数据同步机制')
if missing_in_test > 0:
recommendations.append(f'测试环境缺失 {missing_in_test} 条记录,建议检查数据迁移过程')
if missing_in_pro > 0:
recommendations.append(f'生产环境缺失 {missing_in_pro} 条记录,建议检查数据完整性')
if field_diff_count:
top_diff_field = max(field_diff_count.items(), key=lambda x: x[1])
recommendations.append(f'字段 "{top_diff_field[0]}" 差异最多({top_diff_field[1]}次),建议优先处理')
if not recommendations:
recommendations.append('数据质量良好,建议继续保持当前的数据管理流程')
return recommendations

318
modules/database.py Normal file
View File

@@ -0,0 +1,318 @@
"""
数据库管理模块
==============
本模块负责BigDataTool项目的SQLite数据库管理包括
核心功能:
1. 数据库初始化和表结构创建
2. 数据库连接管理和事务处理
3. 表结构版本控制和字段动态添加
4. 数据库完整性检查和自动修复
数据表结构:
- config_groups: 配置组管理Cassandra/Redis连接配置
- query_history: 查询历史记录(单表/分表/Redis查询
- sharding_config_groups: 分表配置组TWCS分表参数
- query_logs: 查询日志(实时操作日志和性能监控)
- redis_config_groups: Redis配置组集群连接配置
- redis_query_history: Redis查询历史Redis数据比对记录
设计特点:
- 自动化表结构管理:支持字段动态添加和版本升级
- 向后兼容性:确保旧版本数据的正常访问
- 错误恢复:数据库损坏时自动重建表结构
- 索引优化:为查询性能优化的索引设计
使用方式:
- ensure_database(): 确保数据库和表结构存在
- get_db_connection(): 获取标准的数据库连接
- init_database(): 手动初始化数据库(通常自动调用)
作者BigDataTool项目组
更新时间2024年8月
"""
import sqlite3
import json
import os
import logging
from datetime import datetime
logger = logging.getLogger(__name__)
DATABASE_PATH = 'config_groups.db'
def init_database():
"""
初始化SQLite数据库和所有必要的表结构
创建以下数据表:
1. config_groups - Cassandra配置组存储
2. query_history - 查询历史记录存储
3. sharding_config_groups - 分表配置组存储
4. query_logs - 查询日志存储
5. redis_config_groups - Redis配置组存储
6. redis_query_history - Redis查询历史存储
同时创建必要的索引以优化查询性能。
Returns:
bool: 初始化成功返回True失败返回False
注意:
- 使用IF NOT EXISTS确保重复调用安全
- 自动创建性能优化索引
- 支持外键约束和级联删除
"""
try:
conn = sqlite3.connect(DATABASE_PATH)
cursor = conn.cursor()
# 创建配置组表
cursor.execute('''
CREATE TABLE IF NOT EXISTS config_groups (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
description TEXT,
pro_config TEXT NOT NULL,
test_config TEXT NOT NULL,
query_config TEXT NOT NULL,
sharding_config TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
''')
# 创建查询历史表,包含分表配置字段
cursor.execute('''
CREATE TABLE IF NOT EXISTS query_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
description TEXT,
pro_config TEXT NOT NULL,
test_config TEXT NOT NULL,
query_config TEXT NOT NULL,
query_keys TEXT NOT NULL,
results_summary TEXT NOT NULL,
execution_time REAL NOT NULL,
total_keys INTEGER NOT NULL,
differences_count INTEGER NOT NULL,
identical_count INTEGER NOT NULL,
sharding_config TEXT,
query_type TEXT DEFAULT 'single',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
''')
# 创建分表配置组表
cursor.execute('''
CREATE TABLE IF NOT EXISTS sharding_config_groups (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
description TEXT,
pro_config TEXT NOT NULL,
test_config TEXT NOT NULL,
query_config TEXT NOT NULL,
sharding_config TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
''')
# 创建查询日志表
cursor.execute('''
CREATE TABLE IF NOT EXISTS query_logs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
batch_id TEXT NOT NULL,
history_id INTEGER,
timestamp TEXT NOT NULL,
level TEXT NOT NULL,
message TEXT NOT NULL,
query_type TEXT DEFAULT 'single',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (history_id) REFERENCES query_history (id) ON DELETE CASCADE
)
''')
# 创建Redis配置组表
cursor.execute('''
CREATE TABLE IF NOT EXISTS redis_config_groups (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
description TEXT,
cluster1_config TEXT NOT NULL,
cluster2_config TEXT NOT NULL,
query_options TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
''')
# 创建Redis查询历史表
cursor.execute('''
CREATE TABLE IF NOT EXISTS redis_query_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
description TEXT,
cluster1_config TEXT NOT NULL,
cluster2_config TEXT NOT NULL,
query_options TEXT NOT NULL,
query_keys TEXT NOT NULL,
results_summary TEXT NOT NULL,
execution_time REAL NOT NULL,
total_keys INTEGER NOT NULL,
different_count INTEGER NOT NULL,
identical_count INTEGER NOT NULL,
missing_count INTEGER NOT NULL,
raw_results TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
''')
# 创建索引
cursor.execute('CREATE INDEX IF NOT EXISTS idx_query_logs_batch_id ON query_logs(batch_id)')
cursor.execute('CREATE INDEX IF NOT EXISTS idx_query_logs_history_id ON query_logs(history_id)')
cursor.execute('CREATE INDEX IF NOT EXISTS idx_query_logs_timestamp ON query_logs(timestamp)')
cursor.execute('CREATE INDEX IF NOT EXISTS idx_query_logs_level ON query_logs(level)')
conn.commit()
conn.close()
logger.info("数据库初始化完成")
return True
except Exception as e:
logger.error(f"数据库初始化失败: {e}")
return False
def ensure_database():
"""
确保数据库文件和表结构完整存在
执行以下检查和操作:
1. 检查数据库文件是否存在,不存在则创建
2. 验证所有必要表是否存在,缺失则重建
3. 检查表结构是否完整,缺少字段则动态添加
4. 确保索引完整性
支持的表结构升级:
- config_groups表添加sharding_config字段
- query_history表添加sharding_config、query_type、raw_results等字段
- query_logs表添加history_id外键字段
Returns:
bool: 数据库就绪返回True初始化失败返回False
特性:
- 向后兼容:支持从旧版本数据库升级
- 自动修复:检测到问题时自动重建
- 零停机:升级过程不影响现有数据
"""
if not os.path.exists(DATABASE_PATH):
logger.info("数据库文件不存在,正在创建...")
return init_database()
# 检查表是否存在
try:
conn = sqlite3.connect(DATABASE_PATH)
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table' AND name IN ('config_groups', 'query_history', 'sharding_config_groups', 'query_logs', 'redis_config_groups', 'redis_query_history')")
results = cursor.fetchall()
existing_tables = [row[0] for row in results]
required_tables = ['config_groups', 'query_history', 'sharding_config_groups', 'query_logs', 'redis_config_groups', 'redis_query_history']
missing_tables = [table for table in required_tables if table not in existing_tables]
if missing_tables:
logger.info(f"数据库表不完整,缺少表:{missing_tables},正在重新创建...")
return init_database()
# 检查config_groups表是否有sharding_config字段
cursor.execute("PRAGMA table_info(config_groups)")
columns = cursor.fetchall()
column_names = [column[1] for column in columns]
if 'sharding_config' not in column_names:
logger.info("添加sharding_config字段到config_groups表...")
cursor.execute("ALTER TABLE config_groups ADD COLUMN sharding_config TEXT")
conn.commit()
logger.info("sharding_config字段添加成功")
# 检查query_history表是否有分表相关字段
cursor.execute("PRAGMA table_info(query_history)")
history_columns = cursor.fetchall()
history_column_names = [column[1] for column in history_columns]
if 'sharding_config' not in history_column_names:
logger.info("添加sharding_config字段到query_history表...")
cursor.execute("ALTER TABLE query_history ADD COLUMN sharding_config TEXT")
conn.commit()
logger.info("query_history表sharding_config字段添加成功")
if 'query_type' not in history_column_names:
logger.info("添加query_type字段到query_history表...")
cursor.execute("ALTER TABLE query_history ADD COLUMN query_type TEXT DEFAULT 'single'")
conn.commit()
logger.info("query_history表query_type字段添加成功")
# 添加查询结果数据存储字段
if 'raw_results' not in history_column_names:
logger.info("添加raw_results字段到query_history表...")
cursor.execute("ALTER TABLE query_history ADD COLUMN raw_results TEXT")
conn.commit()
logger.info("query_history表raw_results字段添加成功")
if 'differences_data' not in history_column_names:
logger.info("添加differences_data字段到query_history表...")
cursor.execute("ALTER TABLE query_history ADD COLUMN differences_data TEXT")
conn.commit()
logger.info("query_history表differences_data字段添加成功")
if 'identical_data' not in history_column_names:
logger.info("添加identical_data字段到query_history表...")
cursor.execute("ALTER TABLE query_history ADD COLUMN identical_data TEXT")
conn.commit()
logger.info("query_history表identical_data字段添加成功")
# 检查query_logs表是否存在history_id字段
cursor.execute("PRAGMA table_info(query_logs)")
logs_columns = cursor.fetchall()
logs_column_names = [column[1] for column in logs_columns]
if 'history_id' not in logs_column_names:
logger.info("添加history_id字段到query_logs表...")
cursor.execute("ALTER TABLE query_logs ADD COLUMN history_id INTEGER")
# 创建外键索引
cursor.execute('CREATE INDEX IF NOT EXISTS idx_query_logs_history_id ON query_logs(history_id)')
conn.commit()
logger.info("query_logs表history_id字段添加成功")
conn.close()
return True
except Exception as e:
logger.error(f"检查数据库表失败: {e}")
return init_database()
def get_db_connection():
"""
获取配置好的SQLite数据库连接
返回一个配置了Row工厂的数据库连接支持
- 字典式访问查询结果row['column_name']
- 自动类型转换
- 标准的SQLite连接功能
Returns:
sqlite3.Connection: 配置好的数据库连接对象
使用示例:
conn = get_db_connection()
cursor = conn.cursor()
cursor.execute("SELECT * FROM config_groups")
rows = cursor.fetchall()
for row in rows:
print(row['name']) # 字典式访问
conn.close()
"""
conn = sqlite3.connect(DATABASE_PATH)
conn.row_factory = sqlite3.Row
return conn

295
modules/query_engine.py Normal file
View File

@@ -0,0 +1,295 @@
"""
数据查询引擎模块
================
本模块是BigDataTool的核心查询引擎负责Cassandra数据库的高级查询功能。
核心功能:
1. 单表查询标准的Cassandra CQL查询执行
2. 分表查询基于TWCS策略的时间分表查询
3. 多主键查询:支持复合主键的复杂查询条件
4. 混合查询:生产环境分表+测试环境单表的组合查询
查询类型支持:
- 单主键查询WHERE key IN (val1, val2, val3)
- 复合主键查询WHERE (key1='val1' AND key2='val2') OR (key1='val3' AND key2='val4')
- 分表查询:自动计算分表名称并并行查询多张表
- 字段过滤:支持指定查询字段和排除字段
分表查询特性:
- 时间戳提取从Key中智能提取时间戳信息
- 分表计算基于TWCS策略计算目标分表
- 并行查询:同时查询多张分表以提高性能
- 错误容错:单个分表查询失败不影响整体结果
性能优化:
- 查询时间监控:记录每个查询的执行时间
- 批量处理支持大批量Key的高效查询
- 连接复用:优化数据库连接的使用
- 内存管理:大结果集的内存友好处理
作者BigDataTool项目组
更新时间2024年8月
"""
import time
import logging
from .sharding import ShardingCalculator
logger = logging.getLogger(__name__)
def execute_query(session, table, keys, fields, values, exclude_fields=None):
"""
执行Cassandra数据库查询支持单主键和复合主键查询
本函数是查询引擎的核心,能够智能处理不同类型的主键查询:
- 单主键:生成 WHERE key IN (val1, val2, val3) 查询
- 复合主键:生成 WHERE (key1='val1' AND key2='val2') OR ... 查询
Args:
session: Cassandra数据库会话对象
table (str): 目标表名
keys (list): 主键字段名列表,如 ['id'] 或 ['docid', 'id']
fields (list): 要查询的字段列表,空列表表示查询所有字段
values (list): 查询值列表,复合主键值用逗号分隔
exclude_fields (list, optional): 要排除的字段列表
Returns:
list: 查询结果列表每个元素是一个Row对象
查询示例:
# 单主键查询
execute_query(session, 'users', ['id'], ['name', 'email'], ['1', '2', '3'])
# 生成SQL: SELECT name, email FROM users WHERE id IN ('1', '2', '3')
# 复合主键查询
execute_query(session, 'orders', ['user_id', 'order_id'], ['*'], ['1,100', '2,200'])
# 生成SQL: SELECT * FROM orders WHERE (user_id='1' AND order_id='100') OR (user_id='2' AND order_id='200')
错误处理:
- 参数验证检查keys和values是否为空
- SQL注入防护对查询值进行适当转义
- 异常捕获:数据库错误时返回空列表
- 日志记录记录查询SQL和执行统计
"""
try:
# 参数验证
if not keys or len(keys) == 0:
logger.error("Keys参数为空无法构建查询")
return []
if not values or len(values) == 0:
logger.error("Values参数为空无法构建查询")
return []
# 构建查询条件
if len(keys) == 1:
# 单主键查询(保持原有逻辑)
quoted_values = [f"'{value}'" for value in values]
query_conditions = f"{keys[0]} IN ({', '.join(quoted_values)})"
else:
# 复合主键查询
conditions = []
for value in values:
# 检查value是否包含复合主键分隔符
if isinstance(value, str) and ',' in value:
# 解析复合主键值
key_values = [v.strip() for v in value.split(',')]
if len(key_values) == len(keys):
# 构建单个复合主键条件: (key1='val1' AND key2='val2')
key_conditions = []
for i, (key, val) in enumerate(zip(keys, key_values)):
key_conditions.append(f"{key} = '{val}'")
conditions.append(f"({' AND '.join(key_conditions)})")
else:
logger.warning(f"复合主键值 '{value}' 的字段数量({len(key_values)})与主键字段数量({len(keys)})不匹配")
# 将其作为第一个主键的值处理
conditions.append(f"{keys[0]} = '{value}'")
else:
# 单值,作为第一个主键的值处理
conditions.append(f"{keys[0]} = '{value}'")
if conditions:
query_conditions = ' OR '.join(conditions)
else:
logger.error("无法构建有效的查询条件")
return []
# 确定要查询的字段
if fields:
fields_str = ", ".join(fields)
else:
fields_str = "*"
query_sql = f"SELECT {fields_str} FROM {table} WHERE {query_conditions};"
# 记录查询SQL日志
logger.info(f"执行查询SQL: {query_sql}")
if len(keys) > 1:
logger.info(f"复合主键查询参数: 表={table}, 主键字段={keys}, 字段={fields_str}, Key数量={len(values)}")
else:
logger.info(f"单主键查询参数: 表={table}, 主键字段={keys[0]}, 字段={fields_str}, Key数量={len(values)}")
# 执行查询
start_time = time.time()
result = session.execute(query_sql)
execution_time = time.time() - start_time
result_list = list(result) if result else []
logger.info(f"查询完成: 执行时间={execution_time:.3f}秒, 返回记录数={len(result_list)}")
return result_list
except Exception as e:
logger.error(f"查询执行失败: SQL={query_sql if 'query_sql' in locals() else 'N/A'}, 错误={str(e)}")
return []
def execute_sharding_query(session, shard_mapping, keys, fields, exclude_fields=None):
"""
执行分表查询
:param session: Cassandra会话
:param shard_mapping: 分表映射 {table_name: [keys]}
:param keys: 主键字段名列表
:param fields: 要查询的字段列表
:param exclude_fields: 要排除的字段列表
:return: (查询结果列表, 查询到的表列表, 查询失败的表列表)
"""
all_results = []
queried_tables = []
error_tables = []
logger.info(f"开始执行分表查询,涉及 {len(shard_mapping)} 张分表")
total_start_time = time.time()
for table_name, table_keys in shard_mapping.items():
try:
logger.info(f"查询分表 {table_name},包含 {len(table_keys)} 个key: {table_keys}")
# 为每个分表执行查询
table_results = execute_query(session, table_name, keys, fields, table_keys, exclude_fields)
all_results.extend(table_results)
queried_tables.append(table_name)
logger.info(f"分表 {table_name} 查询成功,返回 {len(table_results)} 条记录")
except Exception as e:
logger.error(f"分表 {table_name} 查询失败: {e}")
error_tables.append(table_name)
total_execution_time = time.time() - total_start_time
logger.info(f"分表查询总计完成: 执行时间={total_execution_time:.3f}秒, 成功表数={len(queried_tables)}, 失败表数={len(error_tables)}, 总记录数={len(all_results)}")
return all_results, queried_tables, error_tables
def execute_mixed_query(pro_session, test_session, pro_config, test_config, keys, fields_to_compare, values, exclude_fields, sharding_config):
"""
执行混合查询(生产环境分表,测试环境可能单表或分表)
"""
results = {
'pro_data': [],
'test_data': [],
'sharding_info': {
'calculation_stats': {}
}
}
# 处理生产环境查询
if sharding_config.get('use_sharding_for_pro', False):
# 获取生产环境分表配置参数,优先使用专用参数,否则使用通用参数
pro_interval = sharding_config.get('pro_interval_seconds') or sharding_config.get('interval_seconds', 604800)
pro_table_count = sharding_config.get('pro_table_count') or sharding_config.get('table_count', 14)
# 记录生产环境分表配置信息
logger.info(f"=== 生产环境分表配置 ===")
logger.info(f"启用分表查询: True")
logger.info(f"时间间隔: {pro_interval}秒 ({pro_interval//86400}天)")
logger.info(f"分表数量: {pro_table_count}")
logger.info(f"基础表名: {pro_config['table']}")
pro_calculator = ShardingCalculator(
interval_seconds=pro_interval,
table_count=pro_table_count
)
pro_shard_mapping, pro_failed_keys, pro_calc_stats = pro_calculator.get_all_shard_tables_for_keys(
pro_config['table'], values
)
logger.info(f"生产环境分表映射结果: 涉及{len(pro_shard_mapping)}张分表, 失败Key数量: {len(pro_failed_keys)}")
pro_data, pro_queried_tables, pro_error_tables = execute_sharding_query(
pro_session, pro_shard_mapping, keys, fields_to_compare, exclude_fields
)
results['pro_data'] = pro_data
results['sharding_info']['pro_shards'] = {
'enabled': True,
'interval_seconds': sharding_config.get('pro_interval_seconds', 604800),
'table_count': sharding_config.get('pro_table_count', 14),
'queried_tables': pro_queried_tables,
'error_tables': pro_error_tables,
'failed_keys': pro_failed_keys
}
results['sharding_info']['calculation_stats'].update(pro_calc_stats)
else:
# 生产环境单表查询
logger.info(f"=== 生产环境单表配置 ===")
logger.info(f"启用分表查询: False")
logger.info(f"表名: {pro_config['table']}")
pro_data = execute_query(pro_session, pro_config['table'], keys, fields_to_compare, values, exclude_fields)
results['pro_data'] = pro_data
results['sharding_info']['pro_shards'] = {
'enabled': False,
'queried_tables': [pro_config['table']]
}
# 处理测试环境查询
if sharding_config.get('use_sharding_for_test', False):
# 获取测试环境分表配置参数,优先使用专用参数,否则使用通用参数
test_interval = sharding_config.get('test_interval_seconds') or sharding_config.get('interval_seconds', 604800)
test_table_count = sharding_config.get('test_table_count') or sharding_config.get('table_count', 14)
# 记录测试环境分表配置信息
logger.info(f"=== 测试环境分表配置 ===")
logger.info(f"启用分表查询: True")
logger.info(f"时间间隔: {test_interval}秒 ({test_interval//86400}天)")
logger.info(f"分表数量: {test_table_count}")
logger.info(f"基础表名: {test_config['table']}")
test_calculator = ShardingCalculator(
interval_seconds=test_interval,
table_count=test_table_count
)
test_shard_mapping, test_failed_keys, test_calc_stats = test_calculator.get_all_shard_tables_for_keys(
test_config['table'], values
)
logger.info(f"测试环境分表映射结果: 涉及{len(test_shard_mapping)}张分表, 失败Key数量: {len(test_failed_keys)}")
test_data, test_queried_tables, test_error_tables = execute_sharding_query(
test_session, test_shard_mapping, keys, fields_to_compare, exclude_fields
)
results['test_data'] = test_data
results['sharding_info']['test_shards'] = {
'enabled': True,
'interval_seconds': test_interval,
'table_count': test_table_count,
'queried_tables': test_queried_tables,
'error_tables': test_error_tables,
'failed_keys': test_failed_keys
}
# 合并计算统计信息
if not results['sharding_info']['calculation_stats']:
results['sharding_info']['calculation_stats'] = test_calc_stats
else:
# 测试环境单表查询
logger.info(f"=== 测试环境单表配置 ===")
logger.info(f"启用分表查询: False")
logger.info(f"表名: {test_config['table']}")
test_data = execute_query(test_session, test_config['table'], keys, fields_to_compare, values, exclude_fields)
results['test_data'] = test_data
results['sharding_info']['test_shards'] = {
'enabled': False,
'queried_tables': [test_config['table']]
}
return results

304
modules/query_logger.py Normal file
View File

@@ -0,0 +1,304 @@
"""
查询日志管理模块
================
本模块提供BigDataTool的完整查询日志管理功能支持实时日志收集和历史日志分析。
核心功能:
1. 实时日志收集:自动收集所有查询操作的详细日志
2. 批次管理:按查询批次组织日志,便于追踪完整的查询流程
3. 双重存储:内存缓存 + SQLite持久化存储
4. 历史关联:将日志与查询历史记录关联,支持完整的操作回溯
5. 性能监控:记录查询时间、记录数等性能指标
日志收集特性:
- 多级日志支持INFO、WARNING、ERROR等日志级别
- 批次追踪每个查询批次分配唯一ID便于日志分组
- 时间戳:精确到毫秒的时间戳记录
- 查询类型区分单表、分表、Redis等不同查询类型
- 历史关联:支持日志与查询历史记录的双向关联
存储策略:
- 内存缓存:最近的日志保存在内存中,支持快速访问
- 数据库持久化所有日志自动保存到SQLite数据库
- 容量控制:内存缓存有容量限制,自动清理旧日志
- 事务安全:数据库写入失败不影响程序运行
查询和分析:
- 按批次查询:支持按查询批次获取相关日志
- 按历史记录查询支持按历史记录ID获取相关日志
- 分页支持:大量日志的分页显示
- 时间范围:支持按时间范围筛选日志
- 日志清理:支持按时间清理旧日志
作者BigDataTool项目组
更新时间2024年8月
"""
import sqlite3
import logging
from datetime import datetime, timedelta
from .database import DATABASE_PATH
logger = logging.getLogger(__name__)
class QueryLogCollector:
def __init__(self, max_logs=1000, db_path=None):
self.logs = [] # 内存中的日志缓存
self.max_logs = max_logs
self.current_batch_id = None
self.batch_counter = 0
self.current_query_type = 'single'
self.current_history_id = None # 当前关联的历史记录ID
self.db_path = db_path or DATABASE_PATH
def start_new_batch(self, query_type='single'):
"""开始新的查询批次"""
self.batch_counter += 1
self.current_batch_id = f"batch_{self.batch_counter}_{datetime.now().strftime('%H%M%S')}"
self.current_query_type = query_type
self.current_history_id = None # 重置历史记录ID
# 添加批次开始标记
self.add_log('INFO', f"=== 开始{query_type}查询批次 (ID: {self.current_batch_id}) ===", force_batch_id=self.current_batch_id)
return self.current_batch_id
def set_history_id(self, history_id):
"""设置当前批次关联的历史记录ID"""
self.current_history_id = history_id
if self.current_batch_id and history_id:
self.add_log('INFO', f"关联历史记录ID: {history_id}", force_batch_id=self.current_batch_id)
# 更新当前批次的所有日志记录的history_id
self._update_batch_history_id(self.current_batch_id, history_id)
def _update_batch_history_id(self, batch_id, history_id):
"""更新批次中所有日志的history_id"""
try:
conn = sqlite3.connect(self.db_path, timeout=30)
cursor = conn.cursor()
cursor.execute('''
UPDATE query_logs
SET history_id = ?
WHERE batch_id = ?
''', (history_id, batch_id))
conn.commit()
conn.close()
logger.info(f"已更新批次 {batch_id} 的历史记录关联到 {history_id}")
except Exception as e:
print(f"Warning: Failed to update batch history_id: {e}")
def end_current_batch(self):
"""结束当前查询批次"""
if self.current_batch_id:
self.add_log('INFO', f"=== 查询批次完成 (ID: {self.current_batch_id}) ===", force_batch_id=self.current_batch_id)
self.current_batch_id = None
self.current_history_id = None
def add_log(self, level, message, force_batch_id=None, force_query_type=None, force_history_id=None):
"""添加日志到内存和数据库"""
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]
batch_id = force_batch_id or self.current_batch_id
query_type = force_query_type or self.current_query_type
history_id = force_history_id or self.current_history_id
log_entry = {
'timestamp': timestamp,
'level': level,
'message': message,
'batch_id': batch_id,
'query_type': query_type,
'history_id': history_id
}
# 添加到内存缓存
self.logs.append(log_entry)
if len(self.logs) > self.max_logs:
self.logs.pop(0)
# 保存到数据库
self._save_log_to_db(log_entry)
def _save_log_to_db(self, log_entry):
"""将日志保存到数据库"""
try:
conn = sqlite3.connect(self.db_path, timeout=30)
cursor = conn.cursor()
cursor.execute('''
INSERT INTO query_logs (batch_id, history_id, timestamp, level, message, query_type)
VALUES (?, ?, ?, ?, ?, ?)
''', (
log_entry['batch_id'],
log_entry['history_id'],
log_entry['timestamp'],
log_entry['level'],
log_entry['message'],
log_entry['query_type']
))
conn.commit()
conn.close()
except Exception as e:
# 数据库写入失败时记录到控制台,但不影响程序运行
print(f"Warning: Failed to save log to database: {e}")
def get_logs(self, limit=None, from_db=True):
"""获取日志,支持从数据库或内存获取"""
if from_db:
return self._get_logs_from_db(limit)
else:
# 从内存获取
if limit:
return self.logs[-limit:]
return self.logs
def _get_logs_from_db(self, limit=None):
"""从数据库获取日志"""
try:
conn = sqlite3.connect(self.db_path, timeout=30)
conn.row_factory = sqlite3.Row
cursor = conn.cursor()
query = '''
SELECT batch_id, history_id, timestamp, level, message, query_type
FROM query_logs
ORDER BY id DESC
'''
if limit:
query += f' LIMIT {limit}'
cursor.execute(query)
rows = cursor.fetchall()
# 转换为字典格式并反转顺序(最新的在前)
logs = []
for row in reversed(rows):
logs.append({
'batch_id': row['batch_id'],
'history_id': row['history_id'],
'timestamp': row['timestamp'],
'level': row['level'],
'message': row['message'],
'query_type': row['query_type']
})
conn.close()
return logs
except Exception as e:
print(f"Warning: Failed to get logs from database: {e}")
# 如果数据库读取失败,返回内存中的日志
return self.get_logs(limit, from_db=False)
def _get_total_logs_count(self):
"""获取数据库中的日志总数"""
try:
conn = sqlite3.connect(self.db_path, timeout=30)
cursor = conn.cursor()
cursor.execute('SELECT COUNT(*) FROM query_logs')
count = cursor.fetchone()[0]
conn.close()
return count
except Exception as e:
print(f"Warning: Failed to get logs count from database: {e}")
return len(self.logs)
def get_logs_by_history_id(self, history_id):
"""根据历史记录ID获取相关日志"""
try:
conn = sqlite3.connect(self.db_path, timeout=30)
conn.row_factory = sqlite3.Row
cursor = conn.cursor()
cursor.execute('''
SELECT batch_id, history_id, timestamp, level, message, query_type
FROM query_logs
WHERE history_id = ?
ORDER BY id ASC
''', (history_id,))
rows = cursor.fetchall()
logs = []
for row in rows:
logs.append({
'batch_id': row['batch_id'],
'history_id': row['history_id'],
'timestamp': row['timestamp'],
'level': row['level'],
'message': row['message'],
'query_type': row['query_type']
})
conn.close()
return logs
except Exception as e:
print(f"Warning: Failed to get logs by history_id: {e}")
return []
def get_logs_grouped_by_batch(self, limit=None, from_db=True):
"""按批次分组获取日志"""
logs = self.get_logs(limit, from_db)
grouped_logs = {}
batch_order = []
for log in logs:
batch_id = log.get('batch_id', 'unknown')
if batch_id not in grouped_logs:
grouped_logs[batch_id] = []
batch_order.append(batch_id)
grouped_logs[batch_id].append(log)
# 返回按时间顺序排列的批次
return [(batch_id, grouped_logs[batch_id]) for batch_id in batch_order]
def clear_logs(self, clear_db=True):
"""清空日志"""
# 清空内存
self.logs.clear()
self.current_batch_id = None
self.batch_counter = 0
# 清空数据库
if clear_db:
try:
conn = sqlite3.connect(self.db_path, timeout=30)
cursor = conn.cursor()
cursor.execute('DELETE FROM query_logs')
conn.commit()
conn.close()
except Exception as e:
print(f"Warning: Failed to clear logs from database: {e}")
def cleanup_old_logs(self, days_to_keep=30):
"""清理旧日志,保留指定天数的日志"""
try:
conn = sqlite3.connect(self.db_path, timeout=30)
cursor = conn.cursor()
# 删除超过指定天数的日志
cutoff_date = datetime.now() - timedelta(days=days_to_keep)
cursor.execute('''
DELETE FROM query_logs
WHERE created_at < ?
''', (cutoff_date.strftime('%Y-%m-%d %H:%M:%S'),))
deleted_count = cursor.rowcount
conn.commit()
conn.close()
logger.info(f"清理了 {deleted_count} 条超过 {days_to_keep} 天的旧日志")
return deleted_count
except Exception as e:
logger.error(f"清理旧日志失败: {e}")
return 0
# 自定义日志处理器
class CollectorHandler(logging.Handler):
def __init__(self, collector):
super().__init__()
self.collector = collector
def emit(self, record):
self.collector.add_log(record.levelname, record.getMessage())

279
modules/redis_client.py Normal file
View File

@@ -0,0 +1,279 @@
"""
Redis连接管理模块
===================
本模块提供Redis集群的连接管理和基础操作功能支持单节点和集群模式。
核心功能:
1. 智能连接管理:自动检测单节点和集群模式
2. 连接池优化:高效的连接复用和资源管理
3. 错误处理:完善的连接失败诊断和重试机制
4. 性能监控:连接时间和操作性能的实时监控
5. 类型检测自动识别Redis数据类型
连接特性:
- 自适应模式:根据节点数量自动选择连接方式
- 连接池管理:每个节点独立的连接池配置
- 超时控制:可配置的连接和操作超时时间
- 密码认证支持Redis AUTH认证
- 健康检查:连接状态的实时监控
支持的Redis版本
- Redis 5.0+:完整功能支持
- Redis Cluster集群模式支持
- Redis Sentinel哨兵模式支持通过配置
错误诊断:
- 连接超时:网络延迟和服务器负载分析
- 认证失败:密码验证和权限检查
- 集群错误:节点状态和集群配置验证
- 数据类型错误:类型检测和转换建议
作者BigDataTool项目组
更新时间2024年8月
"""
import time
import logging
import redis
from redis.cluster import RedisCluster, ClusterNode, key_slot
from redis.exceptions import RedisError, ConnectionError
logger = logging.getLogger(__name__)
class RedisPerformanceTracker:
"""Redis操作性能统计追踪器"""
def __init__(self):
self.connection_times = {} # 连接耗时
self.query_times = {} # 查询耗时
self.comparison_time = 0 # 比对耗时
self.scan_time = 0 # scan操作耗时
self.connection_status = {} # 连接状态
self.start_time = time.time()
def record_connection(self, cluster_name, start_time, end_time, success, error_msg=None):
"""记录连接信息"""
self.connection_times[cluster_name] = end_time - start_time
self.connection_status[cluster_name] = {
'success': success,
'error_msg': error_msg,
'connect_time': end_time - start_time
}
def record_query(self, operation_name, duration):
"""记录查询操作耗时"""
self.query_times[operation_name] = duration
def record_scan_time(self, duration):
"""记录scan操作耗时"""
self.scan_time = duration
def record_comparison_time(self, duration):
"""记录比对耗时"""
self.comparison_time = duration
def get_total_time(self):
"""获取总耗时"""
return time.time() - self.start_time
def generate_report(self):
"""生成性能报告"""
total_time = self.get_total_time()
report = {
'total_time': total_time,
'connections': self.connection_status,
'operations': {
'scan_time': self.scan_time,
'comparison_time': self.comparison_time,
'queries': self.query_times
}
}
return report
def create_redis_client(cluster_config, cluster_name="Redis集群", performance_tracker=None):
"""
创建Redis客户端自动检测单节点或集群模式
Args:
cluster_config: Redis配置
cluster_name: 集群名称用于日志
performance_tracker: 性能追踪器
Returns:
Redis客户端实例或None
"""
start_time = time.time()
try:
# 获取节点配置
nodes = cluster_config.get('nodes', [])
if not nodes:
raise RedisError("未配置Redis节点")
# 通用连接参数
common_params = {
'password': cluster_config.get('password'),
'socket_timeout': cluster_config.get('socket_timeout', 3),
'socket_connect_timeout': cluster_config.get('socket_connect_timeout', 3),
'decode_responses': False, # 保持原始字节数据
'retry_on_timeout': True
}
logger.info(f"正在连接{cluster_name}...")
logger.info(f"节点配置: {[(node['host'], node['port']) for node in nodes]}")
# 首先尝试单节点模式连接第一个节点
first_node = nodes[0]
try:
logger.info(f"尝试单节点模式连接: {first_node['host']}:{first_node['port']}")
single_client = redis.Redis(
host=first_node['host'],
port=first_node['port'],
**common_params
)
# 测试连接
single_client.ping()
# 检查是否启用了集群模式
try:
info = single_client.info()
cluster_enabled = info.get('cluster_enabled', 0)
if cluster_enabled == 1:
# 这是一个集群节点,关闭单节点连接,使用集群模式
logger.info("检测到集群模式已启用,切换到集群客户端")
single_client.close()
return _create_cluster_client(cluster_config, cluster_name, performance_tracker, start_time, common_params)
else:
# 单节点模式工作正常
end_time = time.time()
connection_time = end_time - start_time
if performance_tracker:
performance_tracker.record_connection(cluster_name, start_time, end_time, True)
logger.info(f"{cluster_name}连接成功(单节点模式),耗时 {connection_time:.3f}")
return single_client
except Exception as info_error:
# 如果获取info失败但ping成功仍然使用单节点模式
logger.warning(f"无法获取集群信息,继续使用单节点模式: {info_error}")
end_time = time.time()
connection_time = end_time - start_time
if performance_tracker:
performance_tracker.record_connection(cluster_name, start_time, end_time, True)
logger.info(f"{cluster_name}连接成功(单节点模式),耗时 {connection_time:.3f}")
return single_client
except Exception as single_error:
logger.warning(f"单节点模式连接失败: {single_error}")
logger.info("尝试集群模式连接...")
# 单节点模式失败,尝试集群模式
return _create_cluster_client(cluster_config, cluster_name, performance_tracker, start_time, common_params)
except Exception as e:
end_time = time.time()
connection_time = end_time - start_time
error_msg = f"连接失败: {str(e)}"
if performance_tracker:
performance_tracker.record_connection(cluster_name, start_time, end_time, False, error_msg)
logger.error(f"{cluster_name}{error_msg},耗时 {connection_time:.3f}")
return None
def _create_cluster_client(cluster_config, cluster_name, performance_tracker, start_time, common_params):
"""创建集群客户端"""
try:
# 构建集群节点列表
startup_nodes = []
for node in cluster_config.get('nodes', []):
startup_nodes.append(ClusterNode(node['host'], node['port']))
# 创建Redis集群客户端
cluster_client = RedisCluster(
startup_nodes=startup_nodes,
max_connections_per_node=cluster_config.get('max_connections_per_node', 16),
skip_full_coverage_check=True, # 跳过全覆盖检查,允许部分节点不可用
**common_params
)
# 测试集群连接
cluster_client.ping()
end_time = time.time()
connection_time = end_time - start_time
if performance_tracker:
performance_tracker.record_connection(cluster_name, start_time, end_time, True)
logger.info(f"{cluster_name}连接成功(集群模式),耗时 {connection_time:.3f}")
return cluster_client
except Exception as cluster_error:
end_time = time.time()
connection_time = end_time - start_time
error_msg = f"集群模式连接失败: {str(cluster_error)}"
if performance_tracker:
performance_tracker.record_connection(cluster_name, start_time, end_time, False, error_msg)
logger.error(f"{cluster_name}{error_msg},耗时 {connection_time:.3f}")
return None
def test_redis_connection(cluster_config, cluster_name="Redis集群"):
"""
测试Redis连接
Args:
cluster_config: Redis集群配置
cluster_name: 集群名称
Returns:
dict: 连接测试结果
"""
result = {
'success': False,
'error': None,
'connection_time': 0,
'cluster_info': None
}
start_time = time.time()
client = None
try:
client = create_redis_client(cluster_config, cluster_name)
if client:
# 获取集群信息
info = client.info()
cluster_info = {
'redis_version': info.get('redis_version', 'Unknown'),
'connected_clients': info.get('connected_clients', 0),
'used_memory_human': info.get('used_memory_human', 'Unknown'),
'keyspace_hits': info.get('keyspace_hits', 0),
'keyspace_misses': info.get('keyspace_misses', 0)
}
result['success'] = True
result['cluster_info'] = cluster_info
else:
result['error'] = "连接创建失败"
except Exception as e:
result['error'] = str(e)
finally:
result['connection_time'] = time.time() - start_time
if client:
try:
client.close()
except:
pass
return result

672
modules/redis_query.py Normal file
View File

@@ -0,0 +1,672 @@
"""
Redis查询引擎模块
=================
本模块是Redis数据比对的核心引擎提供高级的Redis数据查询和比较功能。
核心功能:
1. 多模式查询随机采样和指定Key两种查询模式
2. 全类型支持支持所有Redis数据类型的查询和比较
3. 智能比较:针对不同数据类型的专门比较算法
4. 性能监控:详细的查询时间和性能统计
5. 错误容错单个Key查询失败不影响整体结果
查询模式:
- 随机采样从源集群随机获取指定数量的Key进行比对
- 指定Key对用户提供的Key列表进行精确比对
- 模式匹配支持通配符模式的Key筛选
支持的数据类型:
- String字符串类型自动检测JSON格式
- Hash哈希表字段级别的深度比较
- List列表保持元素顺序的精确比较
- Set集合自动排序后的内容比较
- ZSet有序集合包含分数的完整比较
- Stream消息流消息级别的详细比较
比较算法:
- JSON智能比较自动检测和比较JSON格式数据
- 类型一致性检查:确保两个集群中数据类型一致
- 内容深度比较:递归比较复杂数据结构
- 性能优化:大数据集的高效比较算法
统计分析:
- 一致性统计相同、不同、缺失Key的详细统计
- 类型分布:各种数据类型的分布统计
- 性能指标:查询时间、连接时间等性能数据
- 错误分析:查询失败的详细错误统计
作者BigDataTool项目组
更新时间2024年8月
"""
import time
import logging
import random
from redis.cluster import key_slot
from redis.exceptions import RedisError
from .redis_client import RedisPerformanceTracker
logger = logging.getLogger(__name__)
# 导入查询日志收集器
try:
from app import query_log_collector
except ImportError:
# 如果导入失败,创建一个空的日志收集器
class DummyQueryLogCollector:
def start_new_batch(self, query_type):
return None
def end_current_batch(self):
pass
def set_history_id(self, history_id):
pass
def add_log(self, level, message):
pass
query_log_collector = DummyQueryLogCollector()
def _get_redis_command_by_type(redis_type):
"""根据Redis数据类型返回对应的查询命令"""
command_map = {
'string': 'GET',
'hash': 'HGETALL',
'list': 'LRANGE',
'set': 'SMEMBERS',
'zset': 'ZRANGE',
'stream': 'XRANGE'
}
return command_map.get(redis_type, 'TYPE')
def _get_data_summary(key_info):
"""获取数据内容的概要信息"""
if not key_info['exists']:
return "不存在"
key_type = key_info['type']
value = key_info['value']
try:
if key_type == 'string':
if isinstance(value, str):
if len(value) > 50:
return f"字符串({len(value)}字符): {value[:47]}..."
else:
return f"字符串: {value}"
else:
return f"字符串: {str(value)[:50]}..."
elif key_type == 'hash':
if isinstance(value, dict):
field_count = len(value)
sample_fields = list(value.keys())[:3]
fields_str = ", ".join(sample_fields)
if field_count > 3:
fields_str += "..."
return f"哈希({field_count}个字段): {fields_str}"
else:
return f"哈希: {str(value)[:50]}..."
elif key_type == 'list':
if isinstance(value, list):
list_len = len(value)
if list_len > 0:
first_item = str(value[0])[:20] if value[0] else ""
return f"列表({list_len}个元素): [{first_item}...]"
else:
return "列表(空)"
else:
return f"列表: {str(value)[:50]}..."
elif key_type == 'set':
if isinstance(value, (set, list)):
set_len = len(value)
if set_len > 0:
first_item = str(list(value)[0])[:20] if value else ""
return f"集合({set_len}个元素): {{{first_item}...}}"
else:
return "集合(空)"
else:
return f"集合: {str(value)[:50]}..."
elif key_type == 'zset':
if isinstance(value, list):
zset_len = len(value)
if zset_len > 0:
first_item = f"{value[0][0]}:{value[0][1]}" if value[0] else ""
return f"有序集合({zset_len}个元素): {{{first_item}...}}"
else:
return "有序集合(空)"
else:
return f"有序集合: {str(value)[:50]}..."
elif key_type == 'stream':
if isinstance(value, list):
stream_len = len(value)
return f"流({stream_len}条消息)"
else:
return f"流: {str(value)[:50]}..."
else:
return f"未知类型: {str(value)[:50]}..."
except Exception as e:
return f"解析错误: {str(e)[:30]}..."
def get_random_keys_from_redis(redis_client, count=100, pattern="*", performance_tracker=None):
"""
从Redis集群中获取随机keys
Args:
redis_client: Redis客户端
count: 要获取的key数量
pattern: key匹配模式默认为 "*"
performance_tracker: 性能追踪器
Returns:
list: 随机key列表
"""
start_time = time.time()
keys = set()
logger.info(f"开始扫描获取随机keys目标数量: {count},模式: {pattern}")
query_log_collector.add_log('INFO', f"🔍 开始扫描Key目标数量: {count},匹配模式: '{pattern}'")
try:
# 使用scan_iter获取keys
scan_count = max(count * 2, 1000) # 扫描更多key以确保随机性
query_log_collector.add_log('INFO', f"📡 执行SCAN命令扫描批次大小: {scan_count}")
scan_iterations = 0
for key in redis_client.scan_iter(match=pattern, count=scan_count):
keys.add(key)
scan_iterations += 1
# 每扫描1000个key记录一次进度
if scan_iterations % 1000 == 0:
query_log_collector.add_log('INFO', f"📊 扫描进度: 已发现 {len(keys)} 个匹配的Key")
if len(keys) >= count * 3: # 获取更多key以便随机选择
break
total_found = len(keys)
query_log_collector.add_log('INFO', f"🎯 扫描完成,共发现 {total_found} 个匹配的Key")
# 如果获取的key数量超过需要的数量随机选择
if len(keys) > count:
keys = random.sample(list(keys), count)
query_log_collector.add_log('INFO', f"🎲 从 {total_found} 个Key中随机选择 {count}")
else:
keys = list(keys)
if total_found < count:
query_log_collector.add_log('WARNING', f"⚠️ 实际找到的Key数量({total_found})少于目标数量({count})")
# 记录选中的Key样本前10个
key_sample = keys[:10] if len(keys) > 10 else keys
key_list_str = ", ".join([f"'{k}'" for k in key_sample])
if len(keys) > 10:
key_list_str += f" ... (共{len(keys)}个)"
query_log_collector.add_log('INFO', f"📋 选中的Key样本: [{key_list_str}]")
end_time = time.time()
scan_duration = end_time - start_time
if performance_tracker:
performance_tracker.record_scan_time(scan_duration)
logger.info(f"扫描获取 {len(keys)} 个随机keys耗时 {scan_duration:.3f}")
query_log_collector.add_log('INFO', f"✅ Key扫描完成最终获取 {len(keys)} 个keys总耗时 {scan_duration:.3f}")
return keys
except RedisError as e:
end_time = time.time()
scan_duration = end_time - start_time
if performance_tracker:
performance_tracker.record_scan_time(scan_duration)
logger.error(f"获取随机keys失败: {e},耗时 {scan_duration:.3f}")
query_log_collector.add_log('ERROR', f"获取随机keys失败: {e},耗时 {scan_duration:.3f}")
return []
def get_redis_values_by_keys(redis_client, keys, cluster_name="Redis集群", performance_tracker=None):
"""
批量查询Redis中指定keys的值支持所有Redis数据类型String、Hash、List、Set、ZSet等
Args:
redis_client: Redis客户端
keys: 要查询的key列表
cluster_name: 集群名称用于日志
performance_tracker: 性能追踪器
Returns:
list: 对应keys的值信息字典列表包含类型、值和显示格式
"""
from .redis_types import get_redis_value_with_type
start_time = time.time()
result = []
logger.info(f"开始从{cluster_name}批量查询 {len(keys)} 个keys支持所有数据类型")
query_log_collector.add_log('INFO', f"📊 开始从{cluster_name}批量查询 {len(keys)} 个keys支持所有数据类型")
# 记录要查询的Key列表前10个避免日志过长
key_sample = keys[:10] if len(keys) > 10 else keys
key_list_str = ", ".join([f"'{k}'" for k in key_sample])
if len(keys) > 10:
key_list_str += f" ... (共{len(keys)}个)"
query_log_collector.add_log('INFO', f"🔍 查询Key列表: [{key_list_str}]")
try:
# 逐个查询每个key支持所有Redis数据类型
redis_commands_used = {} # 记录使用的Redis命令
for i, key in enumerate(keys):
key_start_time = time.time()
key_info = get_redis_value_with_type(redis_client, key)
key_duration = time.time() - key_start_time
result.append(key_info)
# 记录每个key的查询详情
if key_info['exists']:
key_type = key_info['type']
# 根据类型确定使用的Redis命令
redis_cmd = _get_redis_command_by_type(key_type)
redis_commands_used[redis_cmd] = redis_commands_used.get(redis_cmd, 0) + 1
# 获取数据内容概要
data_summary = _get_data_summary(key_info)
query_log_collector.add_log('INFO',
f"✅ Key '{key}' | 类型: {key_type} | 命令: {redis_cmd} | 数据: {data_summary} | 耗时: {key_duration:.3f}s")
else:
query_log_collector.add_log('WARNING',
f"❌ Key '{key}' | 状态: 不存在 | 耗时: {key_duration:.3f}s")
end_time = time.time()
query_duration = end_time - start_time
if performance_tracker:
performance_tracker.record_query(f"{cluster_name}_typed_batch_query", query_duration)
# 统计成功获取的key数量和类型分布
successful_count = sum(1 for r in result if r['exists'])
type_stats = {}
for r in result:
if r['exists']:
key_type = r['type']
type_stats[key_type] = type_stats.get(key_type, 0) + 1
# 记录Redis命令使用统计
cmd_stats = ", ".join([f"{cmd}: {count}" for cmd, count in redis_commands_used.items()]) if redis_commands_used else ""
type_info = ", ".join([f"{t}: {c}" for t, c in type_stats.items()]) if type_stats else ""
query_log_collector.add_log('INFO', f"🎯 Redis命令统计: [{cmd_stats}]")
query_log_collector.add_log('INFO', f"📈 从{cluster_name}查询完成,成功获取 {successful_count}/{len(keys)} 个值,数据类型分布: [{type_info}],总耗时 {query_duration:.3f}")
return result
except Exception as e:
end_time = time.time()
query_duration = end_time - start_time
if performance_tracker:
performance_tracker.record_query(f"{cluster_name}_typed_batch_query_error", query_duration)
logger.error(f"{cluster_name}批量查询失败: {e},耗时 {query_duration:.3f}")
query_log_collector.add_log('ERROR', f"{cluster_name}批量查询失败: {e},耗时 {query_duration:.3f}")
# 返回错误占位符
return [{'type': 'error', 'value': None, 'display_value': f'<error: {e}>', 'exists': False} for _ in keys]
def compare_redis_data(client1, client2, keys, cluster1_name="生产集群", cluster2_name="测试集群", performance_tracker=None):
"""
比较两个Redis集群中指定keys的数据支持所有Redis数据类型
Args:
client1: 第一个Redis客户端生产
client2: 第二个Redis客户端测试
keys: 要比较的key列表
cluster1_name: 第一个集群名称
cluster2_name: 第二个集群名称
performance_tracker: 性能追踪器
Returns:
dict: 比较结果,包含统计信息和差异详情
"""
from .redis_types import compare_redis_values
comparison_start_time = time.time()
logger.info(f"开始比较 {cluster1_name}{cluster2_name} 的数据支持所有Redis数据类型")
query_log_collector.add_log('INFO', f"🔄 开始比较 {cluster1_name}{cluster2_name} 的数据支持所有Redis数据类型")
query_log_collector.add_log('INFO', f"📊 比较范围: {len(keys)} 个Key")
# 获取两个集群的数据
query_log_collector.add_log('INFO', f"📥 第一步: 从{cluster1_name}获取数据")
values1 = get_redis_values_by_keys(client1, keys, cluster1_name, performance_tracker)
if not values1:
error_msg = f'{cluster1_name}获取数据失败'
query_log_collector.add_log('ERROR', f"{error_msg}")
return {'error': error_msg}
query_log_collector.add_log('INFO', f"📥 第二步: 从{cluster2_name}获取数据")
values2 = get_redis_values_by_keys(client2, keys, cluster2_name, performance_tracker)
if not values2:
error_msg = f'{cluster2_name}获取数据失败'
query_log_collector.add_log('ERROR', f"{error_msg}")
return {'error': error_msg}
# 开始数据比对
compare_start = time.time()
query_log_collector.add_log('INFO', f"🔍 第三步: 开始逐个比较Key的数据内容")
# 初始化统计数据
stats = {
'total_keys': len(keys),
'identical_count': 0,
'different_count': 0,
'missing_in_cluster1': 0,
'missing_in_cluster2': 0,
'both_missing': 0
}
# 详细结果列表
identical_results = []
different_results = []
missing_results = []
# 逐个比较
comparison_details = [] # 记录比较详情
for i, key in enumerate(keys):
key_str = key.decode('utf-8') if isinstance(key, bytes) else key
value1_info = values1[i]
value2_info = values2[i]
# 使用redis_types模块的比较函数
comparison_result = compare_redis_values(value1_info, value2_info)
# 记录比较详情
comparison_detail = {
'key': key_str,
'cluster1_exists': value1_info['exists'],
'cluster2_exists': value2_info['exists'],
'cluster1_type': value1_info.get('type'),
'cluster2_type': value2_info.get('type'),
'status': comparison_result['status']
}
comparison_details.append(comparison_detail)
if comparison_result['status'] == 'both_missing':
stats['both_missing'] += 1
missing_results.append({
'key': key_str,
'status': 'both_missing',
'message': comparison_result['message']
})
query_log_collector.add_log('WARNING', f"⚠️ Key '{key_str}': 两个集群都不存在")
elif comparison_result['status'] == 'missing_in_cluster1':
stats['missing_in_cluster1'] += 1
missing_results.append({
'key': key_str,
'status': 'missing_in_cluster1',
'cluster1_value': None,
'cluster2_value': value2_info['display_value'],
'cluster2_type': value2_info['type'],
'message': comparison_result['message']
})
query_log_collector.add_log('WARNING', f"❌ Key '{key_str}': 仅在{cluster2_name}存在 (类型: {value2_info['type']})")
elif comparison_result['status'] == 'missing_in_cluster2':
stats['missing_in_cluster2'] += 1
missing_results.append({
'key': key_str,
'status': 'missing_in_cluster2',
'cluster1_value': value1_info['display_value'],
'cluster1_type': value1_info['type'],
'cluster2_value': None,
'message': comparison_result['message']
})
query_log_collector.add_log('WARNING', f"❌ Key '{key_str}': 仅在{cluster1_name}存在 (类型: {value1_info['type']})")
elif comparison_result['status'] == 'identical':
stats['identical_count'] += 1
identical_results.append({
'key': key_str,
'value': value1_info['display_value'],
'type': value1_info['type']
})
query_log_collector.add_log('INFO', f"✅ Key '{key_str}': 数据一致 (类型: {value1_info['type']})")
else: # different
stats['different_count'] += 1
different_results.append({
'key': key_str,
'cluster1_value': value1_info['display_value'],
'cluster1_type': value1_info['type'],
'cluster2_value': value2_info['display_value'],
'cluster2_type': value2_info['type'],
'message': comparison_result['message']
})
# 记录差异详情
type_info = f"{value1_info['type']} vs {value2_info['type']}" if value1_info['type'] != value2_info['type'] else value1_info['type']
query_log_collector.add_log('WARNING', f"🔄 Key '{key_str}': 数据不一致 (类型: {type_info}) - {comparison_result['message']}")
# 每处理100个key记录一次进度
if (i + 1) % 100 == 0:
progress = f"{i + 1}/{len(keys)}"
query_log_collector.add_log('INFO', f"📊 比较进度: {progress} ({((i + 1) / len(keys) * 100):.1f}%)")
compare_end = time.time()
comparison_duration = compare_end - compare_start
total_duration = compare_end - comparison_start_time
if performance_tracker:
performance_tracker.record_comparison_time(comparison_duration)
# 计算百分比
def safe_percentage(part, total):
return round((part / total * 100), 2) if total > 0 else 0
stats['identical_percentage'] = safe_percentage(stats['identical_count'], stats['total_keys'])
stats['different_percentage'] = safe_percentage(stats['different_count'], stats['total_keys'])
stats['missing_percentage'] = safe_percentage(
stats['missing_in_cluster1'] + stats['missing_in_cluster2'] + stats['both_missing'],
stats['total_keys']
)
result = {
'success': True,
'stats': stats,
'identical_results': identical_results,
'different_results': different_results,
'missing_results': missing_results,
'performance': {
'comparison_time': comparison_duration,
'total_time': total_duration
},
'clusters': {
'cluster1_name': cluster1_name,
'cluster2_name': cluster2_name
}
}
# 记录详细的比较总结
query_log_collector.add_log('INFO', f"🎯 数据比对完成,纯比较耗时 {comparison_duration:.3f} 秒,总耗时 {total_duration:.3f}")
# 记录统计信息
query_log_collector.add_log('INFO', f"📊 比对统计总览:")
query_log_collector.add_log('INFO', f" • 总Key数量: {stats['total_keys']}")
query_log_collector.add_log('INFO', f" • ✅ 数据一致: {stats['identical_count']} ({stats['identical_percentage']}%)")
query_log_collector.add_log('INFO', f" • 🔄 数据不同: {stats['different_count']} ({stats['different_percentage']}%)")
query_log_collector.add_log('INFO', f" • ❌ 仅{cluster1_name}存在: {stats['missing_in_cluster2']}")
query_log_collector.add_log('INFO', f" • ❌ 仅{cluster2_name}存在: {stats['missing_in_cluster1']}")
query_log_collector.add_log('INFO', f" • ⚠️ 两集群都不存在: {stats['both_missing']}")
# 记录性能信息
if performance_tracker:
query_log_collector.add_log('INFO', f"⚡ 性能统计: 平均每Key比较耗时 {(comparison_duration / len(keys) * 1000):.2f}ms")
# 记录所有Key的详细信息
query_log_collector.add_log('INFO', f"📋 全部Key详细信息:")
# 统计类型分布
type_distribution = {}
for detail in comparison_details:
key_str = detail['key']
cluster1_type = detail.get('cluster1_type', 'N/A')
cluster2_type = detail.get('cluster2_type', 'N/A')
status = detail.get('status', 'unknown')
# 统计类型分布
if cluster1_type != 'N/A':
type_distribution[cluster1_type] = type_distribution.get(cluster1_type, 0) + 1
elif cluster2_type != 'N/A':
type_distribution[cluster2_type] = type_distribution.get(cluster2_type, 0) + 1
# 记录每个Key的详细信息
if status == 'identical':
query_log_collector.add_log('INFO', f"{key_str} → 类型: {cluster1_type}, 状态: 数据一致")
elif status == 'different':
type_info = cluster1_type if cluster1_type == cluster2_type else f"{cluster1_name}:{cluster1_type} vs {cluster2_name}:{cluster2_type}"
query_log_collector.add_log('INFO', f" 🔄 {key_str} → 类型: {type_info}, 状态: 数据不同")
elif status == 'missing_in_cluster1':
query_log_collector.add_log('INFO', f"{key_str} → 类型: {cluster2_type}, 状态: 仅在{cluster2_name}存在")
elif status == 'missing_in_cluster2':
query_log_collector.add_log('INFO', f"{key_str} → 类型: {cluster1_type}, 状态: 仅在{cluster1_name}存在")
elif status == 'both_missing':
query_log_collector.add_log('INFO', f" ⚠️ {key_str} → 类型: N/A, 状态: 两集群都不存在")
# 记录类型分布统计
if type_distribution:
query_log_collector.add_log('INFO', f"📊 数据类型分布统计:")
for data_type, count in sorted(type_distribution.items()):
percentage = (count / len(keys)) * 100
query_log_collector.add_log('INFO', f"{data_type}: {count} 个 ({percentage:.1f}%)")
# 记录Key列表摘要
key_summary = [detail['key'] for detail in comparison_details[:10]] # 显示前10个key
key_list_str = ', '.join(key_summary)
if len(comparison_details) > 10:
key_list_str += f" ... (共{len(comparison_details)}个Key)"
query_log_collector.add_log('INFO', f"📝 Key列表摘要: [{key_list_str}]")
logger.info(f"数据比对完成,耗时 {comparison_duration:.3f}")
logger.info(f"比对统计: 总计{stats['total_keys']}个key相同{stats['identical_count']}个,不同{stats['different_count']}个,缺失{stats['missing_in_cluster1'] + stats['missing_in_cluster2'] + stats['both_missing']}")
return result
def execute_redis_comparison(config1, config2, query_options):
"""
执行Redis数据比较的主要函数
Args:
config1: 第一个Redis集群配置
config2: 第二个Redis集群配置
query_options: 查询选项,包含查询模式和参数
Returns:
dict: 完整的比较结果
"""
from .redis_client import create_redis_client
# 创建性能追踪器
performance_tracker = RedisPerformanceTracker()
cluster1_name = config1.get('name', '生产集群')
cluster2_name = config2.get('name', '测试集群')
logger.info(f"开始执行Redis数据比较: {cluster1_name} vs {cluster2_name}")
# 开始新的查询批次使用redis查询类型
batch_id = query_log_collector.start_new_batch('redis')
query_log_collector.add_log('INFO', f"🚀 开始执行Redis数据比较: {cluster1_name} vs {cluster2_name}")
query_log_collector.add_log('INFO', f"📋 查询批次ID: {batch_id}")
# 创建连接
client1 = create_redis_client(config1, cluster1_name, performance_tracker)
client2 = create_redis_client(config2, cluster2_name, performance_tracker)
if not client1:
error_msg = f'{cluster1_name}连接失败'
query_log_collector.add_log('ERROR', error_msg)
return {'error': error_msg}
if not client2:
error_msg = f'{cluster2_name}连接失败'
query_log_collector.add_log('ERROR', error_msg)
return {'error': error_msg}
try:
# 获取要比较的keys
keys = []
query_mode = query_options.get('mode', 'random')
if query_mode == 'random':
# 随机获取keys
count = query_options.get('count', 100)
pattern = query_options.get('pattern', '*')
source_cluster = query_options.get('source_cluster', 'cluster2') # 默认从第二个集群获取
source_client = client2 if source_cluster == 'cluster2' else client1
source_name = cluster2_name if source_cluster == 'cluster2' else cluster1_name
logger.info(f"{source_name}随机获取 {count} 个keys")
query_log_collector.add_log('INFO', f"{source_name}随机获取 {count} 个keys")
keys = get_random_keys_from_redis(source_client, count, pattern, performance_tracker)
elif query_mode == 'specified':
# 指定keys
keys = query_options.get('keys', [])
# 如果keys是字符串需要转换为bytesRedis通常使用bytes
keys = [k.encode('utf-8') if isinstance(k, str) else k for k in keys]
query_log_collector.add_log('INFO', f"使用指定的 {len(keys)} 个keys进行比较")
if not keys:
error_msg = '未获取到任何keys进行比较'
query_log_collector.add_log('ERROR', error_msg)
return {'error': error_msg}
logger.info(f"准备比较 {len(keys)} 个keys")
query_log_collector.add_log('INFO', f"准备比较 {len(keys)} 个keys")
# 执行比较
comparison_result = compare_redis_data(
client1, client2, keys,
cluster1_name, cluster2_name,
performance_tracker
)
# 添加性能报告
comparison_result['performance_report'] = performance_tracker.generate_report()
comparison_result['query_options'] = query_options
comparison_result['batch_id'] = batch_id # 添加批次ID到结果中
# 记录最终结果
if comparison_result.get('success'):
query_log_collector.add_log('INFO', f"🎉 Redis数据比较执行成功完成")
# 结束当前批次
query_log_collector.end_current_batch()
return comparison_result
except Exception as e:
logger.error(f"Redis数据比较执行失败: {e}")
query_log_collector.add_log('ERROR', f"💥 Redis数据比较执行失败: {e}")
# 结束当前批次
query_log_collector.end_current_batch()
return {'error': f'执行失败: {str(e)}', 'batch_id': batch_id}
finally:
# 关闭连接
try:
if client1:
client1.close()
if client2:
client2.close()
except:
pass

460
modules/redis_types.py Normal file
View File

@@ -0,0 +1,460 @@
"""
Redis数据类型支持增强模块
================================
本模块提供对Redis所有主要数据类型的完整支持包括
- String类型包括JSON字符串的智能检测和格式化
- Hash类型键值对映射
- List类型有序列表
- Set类型无序集合
- ZSet类型有序集合带分数
- Stream类型消息流完整支持消息解析和比较
主要功能:
1. get_redis_value_with_type() - 获取任意类型的Redis键值
2. compare_redis_values() - 智能比较不同数据类型的值
3. batch_get_redis_values_with_type() - 批量获取键值信息
设计特点:
- 类型安全自动检测并处理每种Redis数据类型
- 编码处理完善的UTF-8解码和二进制数据处理
- JSON支持智能识别和格式化JSON字符串
- Stream支持完整的Stream消息结构解析和比较
- 错误处理:优雅处理连接错误和数据异常
作者BigDataTool项目组
更新时间2024年8月
"""
import json
import logging
from redis.exceptions import RedisError
logger = logging.getLogger(__name__)
def get_redis_value_with_type(redis_client, key):
"""
获取Redis键值及其数据类型的完整信息
这是本模块的核心函数支持所有Redis数据类型的获取和解析。
它会自动检测键的类型然后使用相应的Redis命令获取数据
并进行适当的格式化处理。
Args:
redis_client: Redis客户端连接对象
key (str): 要查询的Redis键名
Returns:
dict: 包含以下字段的字典
- 'type' (str): Redis数据类型 ('string', 'hash', 'list', 'set', 'zset', 'stream')
- 'value': 解析后的原始值Python对象
- 'display_value' (str): 格式化后用于显示的字符串
- 'exists' (bool): 键是否存在
支持的数据类型处理:
- String: 自动检测JSON格式支持二进制数据
- Hash: 完整的字段映射UTF-8解码
- List: 有序列表,保持原始顺序
- Set: 无序集合,自动排序便于比较
- ZSet: 有序集合,包含成员和分数
- Stream: 完整的消息流解析,包含元数据和消息内容
异常处理:
- 连接异常:返回错误状态
- 编码异常:标记为二进制数据
- 数据异常:记录警告并提供基本信息
示例:
>>> result = get_redis_value_with_type(client, "user:example")
>>> print(result['type']) # 'string'
>>> print(result['value']) # 'John Doe'
>>> print(result['exists']) # True
"""
try:
# 检查key是否存在
if not redis_client.exists(key):
return {
'type': None,
'value': None,
'display_value': None,
'exists': False
}
# 获取数据类型
key_type = redis_client.type(key).decode('utf-8')
result = {
'type': key_type,
'exists': True
}
if key_type == 'string':
# String类型处理 - 支持普通字符串和JSON字符串的智能识别
value = redis_client.get(key)
if value:
try:
# 尝试UTF-8解码
str_value = value.decode('utf-8')
result['value'] = str_value
# 智能检测JSON格式并格式化显示
try:
json_value = json.loads(str_value)
result['display_value'] = json.dumps(json_value, indent=2, ensure_ascii=False)
result['type'] = 'json_string' # 标记为JSON字符串类型
except json.JSONDecodeError:
# 不是JSON格式直接显示字符串内容
result['display_value'] = str_value
except UnicodeDecodeError:
# 处理二进制数据 - 无法UTF-8解码的数据
result['value'] = value
result['display_value'] = f"<binary data: {len(value)} bytes>"
else:
# 空字符串处理
result['value'] = ""
result['display_value'] = ""
elif key_type == 'hash':
# Hash类型
hash_data = redis_client.hgetall(key)
decoded_hash = {}
for field, value in hash_data.items():
try:
decoded_field = field.decode('utf-8')
decoded_value = value.decode('utf-8')
decoded_hash[decoded_field] = decoded_value
except UnicodeDecodeError:
decoded_hash[str(field)] = f"<binary: {len(value)} bytes>"
result['value'] = decoded_hash
result['display_value'] = json.dumps(decoded_hash, indent=2, ensure_ascii=False)
elif key_type == 'list':
# List类型
list_data = redis_client.lrange(key, 0, -1)
decoded_list = []
for item in list_data:
try:
decoded_item = item.decode('utf-8')
decoded_list.append(decoded_item)
except UnicodeDecodeError:
decoded_list.append(f"<binary: {len(item)} bytes>")
result['value'] = decoded_list
result['display_value'] = json.dumps(decoded_list, indent=2, ensure_ascii=False)
elif key_type == 'set':
# Set类型
set_data = redis_client.smembers(key)
decoded_set = []
for item in set_data:
try:
decoded_item = item.decode('utf-8')
decoded_set.append(decoded_item)
except UnicodeDecodeError:
decoded_set.append(f"<binary: {len(item)} bytes>")
# 排序以便比较
decoded_set.sort()
result['value'] = decoded_set
result['display_value'] = json.dumps(decoded_set, indent=2, ensure_ascii=False)
elif key_type == 'zset':
# Sorted Set类型
zset_data = redis_client.zrange(key, 0, -1, withscores=True)
decoded_zset = []
for member, score in zset_data:
try:
decoded_member = member.decode('utf-8')
decoded_zset.append([decoded_member, score])
except UnicodeDecodeError:
decoded_zset.append([f"<binary: {len(member)} bytes>", score])
result['value'] = decoded_zset
result['display_value'] = json.dumps(decoded_zset, indent=2, ensure_ascii=False)
elif key_type == 'stream':
# Stream类型
try:
# 获取Stream信息
stream_info = redis_client.xinfo_stream(key)
# 获取Stream中的消息最多获取100条最新消息
stream_messages = redis_client.xrange(key, count=100)
# 解析Stream数据
decoded_stream = {
'info': {
'length': stream_info.get('length', 0),
'radix_tree_keys': stream_info.get('radix-tree-keys', 0),
'radix_tree_nodes': stream_info.get('radix-tree-nodes', 0),
'last_generated_id': stream_info.get('last-generated-id', '').decode('utf-8') if stream_info.get('last-generated-id') else '',
'first_entry': None,
'last_entry': None
},
'messages': []
}
# 处理first-entry和last-entry
if stream_info.get('first-entry'):
first_entry = stream_info['first-entry']
decoded_stream['info']['first_entry'] = {
'id': first_entry[0].decode('utf-8'),
'fields': {first_entry[1][i].decode('utf-8'): first_entry[1][i+1].decode('utf-8')
for i in range(0, len(first_entry[1]), 2)}
}
if stream_info.get('last-entry'):
last_entry = stream_info['last-entry']
decoded_stream['info']['last_entry'] = {
'id': last_entry[0].decode('utf-8'),
'fields': {last_entry[1][i].decode('utf-8'): last_entry[1][i+1].decode('utf-8')
for i in range(0, len(last_entry[1]), 2)}
}
# 处理消息列表
for message in stream_messages:
message_id = message[0].decode('utf-8')
message_fields = message[1]
decoded_message = {
'id': message_id,
'fields': {}
}
# 解析消息字段
for i in range(0, len(message_fields), 2):
try:
field_name = message_fields[i].decode('utf-8')
field_value = message_fields[i+1].decode('utf-8')
decoded_message['fields'][field_name] = field_value
except (IndexError, UnicodeDecodeError):
continue
decoded_stream['messages'].append(decoded_message)
result['value'] = decoded_stream
result['display_value'] = json.dumps(decoded_stream, indent=2, ensure_ascii=False)
except Exception as stream_error:
logger.warning(f"获取Stream详细信息失败 {key}: {stream_error}")
# 如果详细获取失败,至少获取基本信息
try:
stream_length = redis_client.xlen(key)
result['value'] = {'length': stream_length, 'messages': []}
result['display_value'] = f"Stream (length: {stream_length} messages)"
except:
result['value'] = "Stream data (unable to read details)"
result['display_value'] = "Stream data (unable to read details)"
else:
# 未知类型
result['value'] = f"<unsupported type: {key_type}>"
result['display_value'] = f"<unsupported type: {key_type}>"
return result
except Exception as e:
logger.error(f"获取Redis键值失败 {key}: {e}")
return {
'type': 'error',
'value': None,
'display_value': f"<error: {str(e)}>",
'exists': False
}
def compare_redis_values(value1_info, value2_info):
"""
比较两个Redis值
Args:
value1_info: 第一个值的信息字典
value2_info: 第二个值的信息字典
Returns:
dict: 比较结果
"""
# 检查存在性
if not value1_info['exists'] and not value2_info['exists']:
return {
'status': 'both_missing',
'message': '两个集群都不存在此键'
}
elif not value1_info['exists']:
return {
'status': 'missing_in_cluster1',
'message': '集群1中不存在此键'
}
elif not value2_info['exists']:
return {
'status': 'missing_in_cluster2',
'message': '集群2中不存在此键'
}
# 检查类型
type1 = value1_info['type']
type2 = value2_info['type']
if type1 != type2:
return {
'status': 'different',
'message': f'数据类型不同: {type1} vs {type2}'
}
# 比较值
value1 = value1_info['value']
value2 = value2_info['value']
if type1 in ['string', 'json_string']:
# 字符串比较
if value1 == value2:
return {'status': 'identical', 'message': '值相同'}
else:
return {'status': 'different', 'message': '值不同'}
elif type1 == 'hash':
# Hash比较
if value1 == value2:
return {'status': 'identical', 'message': '哈希值相同'}
else:
# 详细比较哈希字段
keys1 = set(value1.keys())
keys2 = set(value2.keys())
if keys1 != keys2:
return {'status': 'different', 'message': f'哈希字段不同: {keys1 - keys2} vs {keys2 - keys1}'}
diff_fields = []
for key in keys1:
if value1[key] != value2[key]:
diff_fields.append(key)
if diff_fields:
return {'status': 'different', 'message': f'哈希字段值不同: {diff_fields}'}
else:
return {'status': 'identical', 'message': '哈希值相同'}
elif type1 == 'list':
# List比较顺序敏感
if value1 == value2:
return {'status': 'identical', 'message': '列表相同'}
else:
return {'status': 'different', 'message': f'列表不同,长度: {len(value1)} vs {len(value2)}'}
elif type1 == 'set':
# Set比较顺序无关
if set(value1) == set(value2):
return {'status': 'identical', 'message': '集合相同'}
else:
return {'status': 'different', 'message': f'集合不同,大小: {len(value1)} vs {len(value2)}'}
elif type1 == 'zset':
# Sorted Set比较
if value1 == value2:
return {'status': 'identical', 'message': '有序集合相同'}
else:
return {'status': 'different', 'message': f'有序集合不同,大小: {len(value1)} vs {len(value2)}'}
elif type1 == 'stream':
# Stream比较
if value1 == value2:
return {'status': 'identical', 'message': 'Stream完全相同'}
else:
# 详细比较Stream
if isinstance(value1, dict) and isinstance(value2, dict):
# 比较Stream基本信息
info1 = value1.get('info', {})
info2 = value2.get('info', {})
if info1.get('length', 0) != info2.get('length', 0):
return {
'status': 'different',
'message': f'Stream长度不同: {info1.get("length", 0)} vs {info2.get("length", 0)}'
}
# 比较最后生成的ID
if info1.get('last_generated_id') != info2.get('last_generated_id'):
return {
'status': 'different',
'message': f'Stream最后ID不同: {info1.get("last_generated_id")} vs {info2.get("last_generated_id")}'
}
# 比较消息内容
messages1 = value1.get('messages', [])
messages2 = value2.get('messages', [])
if len(messages1) != len(messages2):
return {
'status': 'different',
'message': f'Stream消息数量不同: {len(messages1)} vs {len(messages2)}'
}
# 比较具体消息
for i, (msg1, msg2) in enumerate(zip(messages1, messages2)):
if msg1.get('id') != msg2.get('id'):
return {
'status': 'different',
'message': f'Stream消息ID不同 (第{i+1}条): {msg1.get("id")} vs {msg2.get("id")}'
}
if msg1.get('fields') != msg2.get('fields'):
return {
'status': 'different',
'message': f'Stream消息内容不同 (第{i+1}条消息)'
}
return {'status': 'identical', 'message': 'Stream数据相同'}
else:
return {'status': 'different', 'message': 'Stream数据格式不同'}
else:
# 其他类型的通用比较
if value1 == value2:
return {'status': 'identical', 'message': '值相同'}
else:
return {'status': 'different', 'message': '值不同'}
def batch_get_redis_values_with_type(redis_client, keys, cluster_name="Redis集群", performance_tracker=None):
"""
批量获取Redis键值及类型信息
Args:
redis_client: Redis客户端
keys: 键名列表
cluster_name: 集群名称
performance_tracker: 性能追踪器
Returns:
list: 每个键的值信息字典列表
"""
import time
start_time = time.time()
results = []
logger.info(f"开始从{cluster_name}批量获取 {len(keys)} 个键的详细信息")
try:
for key in keys:
key_info = get_redis_value_with_type(redis_client, key)
results.append(key_info)
end_time = time.time()
duration = end_time - start_time
if performance_tracker:
performance_tracker.record_query(f"{cluster_name}_detailed_query", duration)
successful_count = sum(1 for r in results if r['exists'])
logger.info(f"{cluster_name}详细查询完成,成功获取 {successful_count}/{len(keys)} 个值,耗时 {duration:.3f}")
return results
except Exception as e:
logger.error(f"{cluster_name}批量详细查询失败: {e}")
# 返回错误占位符
return [{'type': 'error', 'value': None, 'display_value': f'<error: {e}>', 'exists': False} for _ in keys]

167
modules/sharding.py Normal file
View File

@@ -0,0 +1,167 @@
"""
TWCS分表计算引擎模块
===================
本模块实现基于TWCSTime Window Compaction Strategy策略的时间分表计算功能。
核心功能:
1. 时间戳提取从Key中智能提取时间戳信息
2. 分表索引计算:基于时间窗口计算目标分表索引
3. 分表映射将大批量Key映射到对应的分表
4. 统计分析:提供分表计算的详细统计信息
TWCS分表策略
- 时间窗口可配置的时间间隔默认7天
- 分表数量可配置的分表总数默认14张
- 计算公式timestamp // interval_seconds % table_count
- 表命名base_table_name + "_" + shard_index
时间戳提取算法:
- 优先规则提取Key中最后一个下划线后的数字
- 备用规则提取Key中最长的数字序列
- 容错处理:无法提取时记录到失败列表
- 格式支持支持各种Key格式的时间戳提取
应用场景:
- 大数据表的时间分片:按时间窗口将数据分散到多张表
- 查询性能优化:减少单表数据量,提高查询效率
- 数据生命周期管理:支持按时间窗口的数据清理
- 负载均衡:将查询负载分散到多张表
性能特点:
- 批量计算支持大批量Key的高效分表计算
- 内存友好:使用生成器和迭代器优化内存使用
- 统计完整:提供详细的计算成功率和分布统计
- 错误容错单个Key计算失败不影响整体处理
作者BigDataTool项目组
更新时间2024年8月
"""
import re
import logging
logger = logging.getLogger(__name__)
class ShardingCalculator:
"""
TWCS分表计算器
基于Time Window Compaction Strategy实现的智能分表计算器
用于将时间相关的Key映射到对应的时间窗口分表。
主要特性:
- 时间窗口分片:按配置的时间间隔进行分表
- 智能时间戳提取支持多种Key格式的时间戳解析
- 负载均衡:通过取模运算实现分表间的负载均衡
- 批量处理高效处理大批量Key的分表映射
适用场景:
- 时序数据的分表存储
- 大数据表的性能优化
- 数据生命周期管理
- 查询负载分散
"""
def __init__(self, interval_seconds=604800, table_count=14):
"""
初始化分表计算器
:param interval_seconds: 时间间隔(秒)默认604800(7天)
:param table_count: 分表数量默认14
"""
self.interval_seconds = interval_seconds
self.table_count = table_count
def extract_timestamp_from_key(self, key):
"""
从Key中提取时间戳
新规则:优先提取最后一个下划线后的数字,如果没有下划线则提取最后连续的数字部分
"""
if not key:
return None
key_str = str(key)
# 方法1如果包含下划线尝试提取最后一个下划线后的部分
if '_' in key_str:
parts = key_str.split('_')
last_part = parts[-1]
# 检查最后一部分是否为纯数字
if last_part.isdigit():
timestamp = int(last_part)
logger.info(f"Key '{key}' 通过下划线分割提取到时间戳: {timestamp}")
return timestamp
# 方法2使用正则表达式找到所有数字序列取最后一个较长的
number_sequences = re.findall(r'\d+', key_str)
if not number_sequences:
logger.warning(f"Key '{key}' 中没有找到数字字符")
return None
# 如果有多个数字序列,优先选择最长的,如果长度相同则选择最后一个
longest_sequence = max(number_sequences, key=len)
# 如果最长的有多个,选择最后一个最长的
max_length = len(longest_sequence)
last_longest = None
for seq in number_sequences:
if len(seq) == max_length:
last_longest = seq
try:
timestamp = int(last_longest)
logger.info(f"Key '{key}' 通过数字序列提取到时间戳: {timestamp} (从序列 {number_sequences} 中选择)")
return timestamp
except ValueError:
logger.error(f"Key '{key}' 时间戳转换失败: {last_longest}")
return None
def calculate_shard_index(self, timestamp):
"""
计算分表索引
公式timestamp // interval_seconds % table_count
"""
if timestamp is None:
return None
return int(timestamp) // self.interval_seconds % self.table_count
def get_shard_table_name(self, base_table_name, key):
"""
根据Key获取对应的分表名称
"""
timestamp = self.extract_timestamp_from_key(key)
if timestamp is None:
return None
shard_index = self.calculate_shard_index(timestamp)
return f"{base_table_name}_{shard_index}"
def get_all_shard_tables_for_keys(self, base_table_name, keys):
"""
为一批Keys计算所有需要查询的分表
返回: {shard_table_name: [keys_for_this_shard], ...}
"""
shard_mapping = {}
failed_keys = []
calculation_stats = {
'total_keys': len(keys),
'successful_extractions': 0,
'failed_extractions': 0,
'unique_shards': 0
}
for key in keys:
shard_table = self.get_shard_table_name(base_table_name, key)
if shard_table:
if shard_table not in shard_mapping:
shard_mapping[shard_table] = []
shard_mapping[shard_table].append(key)
calculation_stats['successful_extractions'] += 1
else:
failed_keys.append(key)
calculation_stats['failed_extractions'] += 1
calculation_stats['unique_shards'] = len(shard_mapping)
return shard_mapping, failed_keys, calculation_stats

View File

@@ -1,2 +1,3 @@
Flask==2.3.3
cassandra-driver==3.29.1
cassandra-driver==3.29.1
redis==5.0.1

File diff suppressed because it is too large Load Diff

3167
static/js/redis_compare.js Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -3,7 +3,7 @@
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>数据库查询比对工具 - 支持分表查询</title>
<title>Cassandra数据比对工具 - DataTools Pro</title>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet">
<link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0/css/all.min.css" rel="stylesheet">
<style>
@@ -255,6 +255,16 @@
border-color: #ffb3b3;
color: #d63384;
}
/* 面包屑导航样式 */
.breadcrumb {
background: none;
padding: 0;
}
.breadcrumb-item a {
color: #007bff;
text-decoration: none;
}
</style>
</head>
<body>
@@ -262,7 +272,7 @@
<nav class="navbar navbar-expand-lg navbar-dark bg-primary">
<div class="container">
<a class="navbar-brand" href="/">
<i class="fas fa-tools"></i> 大数据工具集合
<i class="fas fa-database me-2"></i> DataTools Pro
</a>
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarNav">
<span class="navbar-toggler-icon"></span>
@@ -270,13 +280,19 @@
<div class="collapse navbar-collapse" id="navbarNav">
<ul class="navbar-nav ms-auto">
<li class="nav-item">
<a class="nav-link" href="/">首页</a>
<a class="nav-link" href="/">
<i class="fas fa-home"></i> 首页
</a>
</li>
<li class="nav-item">
<a class="nav-link active" href="/db-compare">单表查询</a>
<a class="nav-link active" href="/cassandra-compare">
<i class="fas fa-database"></i> Cassandra比对
</a>
</li>
<li class="nav-item">
<a class="nav-link" href="/sharding-compare">分表查询</a>
<a class="nav-link" href="/redis-compare">
<i class="fab fa-redis"></i> Redis比对
</a>
</li>
</ul>
</div>
@@ -284,10 +300,18 @@
</nav>
<div class="container-fluid py-4">
<!-- 面包屑导航 -->
<nav aria-label="breadcrumb">
<ol class="breadcrumb">
<li class="breadcrumb-item"><a href="/">首页</a></li>
<li class="breadcrumb-item active" aria-current="page">Cassandra数据比对工具</li>
</ol>
</nav>
<div class="row">
<div class="col-12">
<h1 class="text-center mb-4">
<i class="fas fa-database"></i> 数据库查询比对工具
<i class="fas fa-database"></i> Cassandra数据比对工具
<small class="text-muted d-block fs-6 mt-2">支持单表查询和分表查询两种模式</small>
</h1>
</div>
@@ -299,72 +323,6 @@
<div class="config-section">
<h4><i class="fas fa-cogs"></i> 配置管理</h4>
<!-- 查询模式切换 -->
<div class="card mb-3">
<div class="card-header">
<h6><i class="fas fa-toggle-on"></i> 查询模式</h6>
</div>
<div class="card-body">
<div class="form-check form-switch">
<input class="form-check-input" type="checkbox" id="enableSharding" onchange="toggleShardingMode()">
<label class="form-check-label" for="enableSharding">
<strong>启用分表查询模式</strong>
<small class="text-muted d-block">支持TWCS时间分表的智能索引计算</small>
</label>
</div>
</div>
</div>
<!-- 分表参数配置 (默认隐藏) -->
<div class="sharding-config-section" id="shardingConfig" style="display: none;">
<h5><i class="fas fa-layer-group"></i> 分表参数配置</h5>
<div class="row">
<div class="col-md-6">
<h6 class="text-primary">生产环境分表配置</h6>
<div class="form-check mb-2">
<input class="form-check-input" type="checkbox" id="use_sharding_for_pro" checked>
<label class="form-check-label" for="use_sharding_for_pro">
启用分表查询
</label>
</div>
<div class="mb-2">
<label for="pro_interval_seconds" class="form-label">时间间隔(秒)</label>
<input type="number" class="form-control form-control-sm" id="pro_interval_seconds" value="604800" min="1">
<small class="form-text text-muted">默认604800秒(7天)</small>
</div>
<div class="mb-2">
<label for="pro_table_count" class="form-label">分表数量</label>
<input type="number" class="form-control form-control-sm" id="pro_table_count" value="14" min="1" max="100">
<small class="form-text text-muted">默认14张分表</small>
</div>
</div>
<div class="col-md-6">
<h6 class="text-success">测试环境分表配置</h6>
<div class="form-check mb-2">
<input class="form-check-input" type="checkbox" id="use_sharding_for_test">
<label class="form-check-label" for="use_sharding_for_test">
启用分表查询
</label>
</div>
<div class="mb-2">
<label for="test_interval_seconds" class="form-label">时间间隔(秒)</label>
<input type="number" class="form-control form-control-sm" id="test_interval_seconds" value="604800" min="1">
<small class="form-text text-muted">默认604800秒(7天)</small>
</div>
<div class="mb-2">
<label for="test_table_count" class="form-label">分表数量</label>
<input type="number" class="form-control form-control-sm" id="test_table_count" value="14" min="1" max="100">
<small class="form-text text-muted">默认14张分表</small>
</div>
</div>
</div>
<div class="alert alert-info mt-3" role="alert">
<i class="fas fa-info-circle"></i>
<strong>分表计算说明:</strong>系统会自动从Key值中提取时间戳并计算分表索引。
计算公式:<code>时间戳 // 间隔秒数 % 分表数量</code>
</div>
</div>
<!-- 配置组管理 -->
<div class="card mb-3">
<div class="card-header">
@@ -396,12 +354,17 @@
</div>
</div>
<div class="row mt-2">
<div class="col-6">
<div class="col-4">
<button class="btn btn-warning btn-sm w-100" onclick="showQueryHistoryDialog()">
<i class="fas fa-history"></i> 查询历史
</button>
</div>
<div class="col-6">
<div class="col-4">
<button class="btn btn-info btn-sm w-100" onclick="showQueryLogsDialog()">
<i class="fas fa-file-alt"></i> 查询日志
</button>
</div>
<div class="col-4">
<button class="btn btn-secondary btn-sm w-100" onclick="showSaveHistoryDialog()">
<i class="fas fa-bookmark"></i> 保存历史
</button>
@@ -419,6 +382,77 @@
</button>
</div>
<!-- 查询模式切换 -->
<div class="card mb-3">
<div class="card-header">
<h6><i class="fas fa-toggle-on"></i> 查询模式</h6>
</div>
<div class="card-body">
<div class="form-check form-switch">
<input class="form-check-input" type="checkbox" id="enableSharding" onchange="toggleShardingMode()">
<label class="form-check-label" for="enableSharding">
<strong>启用分表查询模式</strong>
<small class="text-muted d-block">支持TWCS时间分表的智能索引计算</small>
</label>
</div>
</div>
</div>
<!-- 分表参数配置 (默认隐藏) -->
<div class="sharding-config-section" id="shardingConfig" style="display: none;">
<div class="card mb-3">
<div class="card-header">
<h6><i class="fas fa-layer-group"></i> 分表参数配置</h6>
</div>
<div class="card-body">
<div class="row">
<div class="col-md-6">
<h6 class="text-primary">生产环境分表配置</h6>
<div class="form-check mb-2">
<input class="form-check-input" type="checkbox" id="use_sharding_for_pro" checked>
<label class="form-check-label" for="use_sharding_for_pro">
启用分表查询
</label>
</div>
<div class="mb-2">
<label for="pro_interval_seconds" class="form-label">时间间隔(秒)</label>
<input type="number" class="form-control form-control-sm" id="pro_interval_seconds" value="604800" min="1">
<small class="form-text text-muted">默认604800秒(7天)</small>
</div>
<div class="mb-2">
<label for="pro_table_count" class="form-label">分表数量</label>
<input type="number" class="form-control form-control-sm" id="pro_table_count" value="14" min="1" max="100">
<small class="form-text text-muted">默认14张分表</small>
</div>
</div>
<div class="col-md-6">
<h6 class="text-success">测试环境分表配置</h6>
<div class="form-check mb-2">
<input class="form-check-input" type="checkbox" id="use_sharding_for_test">
<label class="form-check-label" for="use_sharding_for_test">
启用分表查询
</label>
</div>
<div class="mb-2">
<label for="test_interval_seconds" class="form-label">时间间隔(秒)</label>
<input type="number" class="form-control form-control-sm" id="test_interval_seconds" value="604800" min="1">
<small class="form-text text-muted">默认604800秒(7天)</small>
</div>
<div class="mb-2">
<label for="test_table_count" class="form-label">分表数量</label>
<input type="number" class="form-control form-control-sm" id="test_table_count" value="14" min="1" max="100">
<small class="form-text text-muted">默认14张分表</small>
</div>
</div>
</div>
<div class="alert alert-info mt-3">
<strong>分表计算说明:</strong>系统会自动从Key值中提取时间戳并计算分表索引。
计算公式:<code>时间戳 // 间隔秒数 % 分表数量</code>
</div>
</div>
</div>
</div>
<!-- 生产环境配置 -->
<div class="card mb-3">
<div class="card-header d-flex justify-content-between align-items-center">
@@ -441,7 +475,7 @@
<div class="row mt-2">
<div class="col-8">
<label class="form-label">集群节点 (逗号分隔)</label>
<input type="text" class="form-control form-control-sm" id="pro_hosts" placeholder="10.20.2.22,10.20.2.23">
<input type="text" class="form-control form-control-sm" id="pro_hosts" placeholder="127.0.0.1,127.0.0.2">
</div>
<div class="col-4">
<label class="form-label">端口</label>
@@ -451,7 +485,7 @@
<div class="row mt-2">
<div class="col-6">
<label class="form-label">用户名</label>
<input type="text" class="form-control form-control-sm" id="pro_username" placeholder="cbase">
<input type="text" class="form-control form-control-sm" id="pro_username" placeholder="username">
</div>
<div class="col-6">
<label class="form-label">密码</label>
@@ -461,11 +495,11 @@
<div class="row mt-2">
<div class="col-6">
<label class="form-label">Keyspace</label>
<input type="text" class="form-control form-control-sm" id="pro_keyspace" placeholder="yuqing_skinny">
<input type="text" class="form-control form-control-sm" id="pro_keyspace" placeholder="keyspace">
</div>
<div class="col-6">
<label class="form-label">表名</label>
<input type="text" class="form-control form-control-sm" id="pro_table" placeholder="document">
<input type="text" class="form-control form-control-sm" id="pro_table" placeholder="tablename">
<small class="form-text text-muted" id="pro_table_hint">完整表名或基础表名(分表时)</small>
</div>
</div>
@@ -494,7 +528,7 @@
<div class="row mt-2">
<div class="col-8">
<label class="form-label">集群节点 (逗号分隔)</label>
<input type="text" class="form-control form-control-sm" id="test_hosts" placeholder="10.20.2.22,10.20.2.23">
<input type="text" class="form-control form-control-sm" id="test_hosts" placeholder="127.0.0.1,127.0.0.2">
</div>
<div class="col-4">
<label class="form-label">端口</label>
@@ -504,7 +538,7 @@
<div class="row mt-2">
<div class="col-6">
<label class="form-label">用户名</label>
<input type="text" class="form-control form-control-sm" id="test_username" placeholder="cbase">
<input type="text" class="form-control form-control-sm" id="test_username" placeholder="username">
</div>
<div class="col-6">
<label class="form-label">密码</label>
@@ -514,11 +548,11 @@
<div class="row mt-2">
<div class="col-6">
<label class="form-label">Keyspace</label>
<input type="text" class="form-control form-control-sm" id="test_keyspace" placeholder="yuqing_skinny">
<input type="text" class="form-control form-control-sm" id="test_keyspace" placeholder="keyspace">
</div>
<div class="col-6">
<label class="form-label">表名</label>
<input type="text" class="form-control form-control-sm" id="test_table" placeholder="document_test">
<input type="text" class="form-control form-control-sm" id="test_table" placeholder="tablename">
<small class="form-text text-muted" id="test_table_hint">完整表名或基础表名(分表时)</small>
</div>
</div>
@@ -532,9 +566,9 @@
</div>
<div class="card-body">
<div class="mb-3">
<label class="form-label">主键字段 (逗号分隔)</label>
<input type="text" class="form-control form-control-sm" id="keys" placeholder="docid 或 wmid" value="docid">
<small class="form-text text-muted">分表模式下推荐使用包含时间戳的字段如wmid</small>
<label class="form-label">主键字段 (逗号分隔,支持复合主键)</label>
<input type="text" class="form-control form-control-sm" id="keys" placeholder="单主键:docid 或 复合主键docid,id" value="">
<small class="form-text text-muted">支持单主键或复合主键复合主键用逗号分隔docid,id</small>
</div>
<div class="mb-3">
<label class="form-label">比较字段 (空则比较全部,逗号分隔)</label>
@@ -555,7 +589,7 @@
<h4><i class="fas fa-key"></i> 查询Key管理</h4>
<div class="mb-3">
<label class="form-label">批量Key输入 (一行一个)</label>
<textarea class="form-control query-keys" id="query_values" placeholder="请输入查询的Key值一行一个&#10;单表查询示例:&#10;key1&#10;key2&#10;key3&#10;&#10;分表查询示例(包含时间戳):&#10;wmid_1609459200&#10;wmid_1610064000&#10;wmid_1610668800"></textarea>
<textarea class="form-control query-keys" id="query_values" placeholder="请输入查询的Key值一行一个&#10;单主键示例:&#10;key1&#10;key2&#10;key3&#10;&#10;复合主键示例(逗号分隔):&#10;docid1,id1&#10;docid2,id2&#10;docid3,id3&#10;&#10;分表查询示例(包含时间戳):&#10;wmid_1609459200&#10;wmid_1610064000&#10;wmid_1610668800"></textarea>
<small class="form-text text-muted" id="key_input_hint">单表模式输入普通Key值 | 分表模式Key值应包含时间戳用于计算分表索引</small>
</div>
<div class="mb-3">
@@ -604,13 +638,13 @@
</button>
</li>
<li class="nav-item" role="presentation">
<button class="nav-link" id="summary-tab" data-bs-toggle="tab" data-bs-target="#summary-panel" type="button" role="tab">
<i class="fas fa-chart-pie"></i> 比较总结
<button class="nav-link" id="rawdata-tab" data-bs-toggle="tab" data-bs-target="#rawdata-panel" type="button" role="tab">
<i class="fas fa-database"></i> 原始数据 <span class="badge bg-info ms-1" id="rawdata-count">0</span>
</button>
</li>
<li class="nav-item" role="presentation">
<button class="nav-link" id="logs-tab" data-bs-toggle="tab" data-bs-target="#logs-panel" type="button" role="tab">
<i class="fas fa-file-alt"></i> 查询日志
<button class="nav-link" id="summary-tab" data-bs-toggle="tab" data-bs-target="#summary-panel" type="button" role="tab">
<i class="fas fa-chart-pie"></i> 比较总结
</button>
</li>
</ul>
@@ -637,44 +671,46 @@
</div>
</div>
<!-- 原始数据面板 -->
<div class="tab-pane fade" id="rawdata-panel" role="tabpanel">
<div class="container-fluid">
<!-- 筛选控制区 -->
<div class="row mb-3">
<div class="col-md-6">
<div class="d-flex align-items-center">
<label class="form-label me-2 mb-0">显示环境:</label>
<div class="form-check form-check-inline">
<input class="form-check-input" type="checkbox" id="showProData" checked onchange="filterRawData()">
<label class="form-check-label" for="showProData">生产环境</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="checkbox" id="showTestData" checked onchange="filterRawData()">
<label class="form-check-label" for="showTestData">测试环境</label>
</div>
</div>
</div>
<div class="col-md-6">
<div class="input-group">
<span class="input-group-text"><i class="fas fa-search"></i></span>
<input type="text" class="form-control" placeholder="搜索Key或字段值..."
onkeyup="searchRawData(this.value)" id="rawDataSearch">
</div>
</div>
</div>
<!-- 原始数据内容 -->
<div id="raw-data-content">
<!-- 原始数据将在这里动态生成 -->
</div>
</div>
</div>
<!-- 比较总结面板 -->
<div class="tab-pane fade" id="summary-panel" role="tabpanel">
<div id="comparison-summary">
<!-- 总结报告将在这里动态生成 -->
</div>
</div>
<!-- 查询日志面板 -->
<div class="tab-pane fade" id="logs-panel" role="tabpanel">
<div class="d-flex justify-content-between align-items-center mb-3">
<div class="d-flex align-items-center">
<h6 class="mb-0 me-3">查询执行日志</h6>
<div class="form-check form-check-inline">
<input class="form-check-input" type="checkbox" id="log-level-info" checked onchange="filterLogsByLevel()">
<label class="form-check-label text-primary" for="log-level-info">INFO</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="checkbox" id="log-level-warning" checked onchange="filterLogsByLevel()">
<label class="form-check-label text-warning" for="log-level-warning">WARNING</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="checkbox" id="log-level-error" checked onchange="filterLogsByLevel()">
<label class="form-check-label text-danger" for="log-level-error">ERROR</label>
</div>
</div>
<div>
<button class="btn btn-sm btn-outline-primary me-2" onclick="refreshQueryLogs()">
<i class="fas fa-sync-alt"></i> 刷新
</button>
<button class="btn btn-sm btn-outline-danger" onclick="clearQueryLogs()">
<i class="fas fa-trash"></i> 清空
</button>
</div>
</div>
<div id="query-logs" style="max-height: 500px; overflow-y: auto;">
<!-- 查询日志将在这里动态生成 -->
</div>
</div>
</div>
</div>
</div>
@@ -683,6 +719,56 @@
</div>
</div>
<!-- 查询日志模态框 -->
<div class="modal fade" id="queryLogsModal" tabindex="-1" aria-labelledby="queryLogsModalLabel" aria-hidden="true">
<div class="modal-dialog modal-xl">
<div class="modal-content">
<div class="modal-header">
<h5 class="modal-title" id="queryLogsModalLabel">
<i class="fas fa-file-alt"></i> 查询日志管理
</h5>
<button type="button" class="btn-close" data-bs-dismiss="modal" aria-label="Close"></button>
</div>
<div class="modal-body">
<div class="d-flex justify-content-between align-items-center mb-3">
<div class="d-flex align-items-center">
<h6 class="mb-0 me-3">查询执行日志</h6>
<div class="form-check form-check-inline">
<input class="form-check-input" type="checkbox" id="modal-log-level-info" checked onchange="filterModalLogsByLevel()">
<label class="form-check-label text-primary" for="modal-log-level-info">INFO</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="checkbox" id="modal-log-level-warning" checked onchange="filterModalLogsByLevel()">
<label class="form-check-label text-warning" for="modal-log-level-warning">WARNING</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="checkbox" id="modal-log-level-error" checked onchange="filterModalLogsByLevel()">
<label class="form-check-label text-danger" for="modal-log-level-error">ERROR</label>
</div>
</div>
<div>
<button class="btn btn-sm btn-outline-primary me-2" onclick="refreshModalQueryLogs()">
<i class="fas fa-sync-alt"></i> 刷新
</button>
<button class="btn btn-sm btn-outline-warning me-2" onclick="cleanupOldLogs()">
<i class="fas fa-broom"></i> 清理旧日志
</button>
<button class="btn btn-sm btn-outline-danger" onclick="clearQueryLogs()">
<i class="fas fa-trash"></i> 清空
</button>
</div>
</div>
<div id="modal-query-logs" style="max-height: 600px; overflow-y: auto;">
<!-- 查询日志将在这里动态生成 -->
</div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" data-bs-dismiss="modal">关闭</button>
</div>
</div>
</div>
</div>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.bundle.min.js"></script>
<script src="{{ url_for('static', filename='js/app.js') }}"></script>
</body>

View File

@@ -3,7 +3,7 @@
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>大数据工具集合</title>
<title>DataTools Pro - 专业数据处理工具平台</title>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet">
<link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0/css/all.min.css" rel="stylesheet">
<style>
@@ -152,7 +152,7 @@
<nav class="navbar navbar-expand-lg navbar-dark" style="background: rgba(0,0,0,0.1);">
<div class="container">
<a class="navbar-brand" href="/">
<i class="fas fa-tools"></i> 大数据工具集合
<i class="fas fa-code-branch"></i> DataTools Pro
</a>
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarNav">
<span class="navbar-toggler-icon"></span>
@@ -163,7 +163,10 @@
<a class="nav-link active" href="/">首页</a>
</li>
<li class="nav-item">
<a class="nav-link" href="/db-compare">数据库比对</a>
<a class="nav-link" href="/cassandra-compare">Cassandra比对</a>
</li>
<li class="nav-item">
<a class="nav-link" href="/redis-compare">Redis比对</a>
</li>
</ul>
</div>
@@ -174,11 +177,11 @@
<div class="hero-section">
<div class="container">
<h1 class="hero-title">
<i class="fas fa-database"></i> 大数据工具集合
<i class="fas fa-rocket"></i> DataTools Pro
</h1>
<p class="hero-subtitle">
专业的数据处理、分析和比对工具平台<br>
提升数据工作效率,简化复杂操作
企业级数据处理比对工具平台<br>
高效、精准、可视化的数据分析解决方案
</p>
</div>
</div>
@@ -189,20 +192,20 @@
<div class="row">
<div class="col-md-4">
<div class="stat-item">
<span class="stat-number">1</span>
<span class="stat-label">可用工具</span>
<span class="stat-number">2</span>
<span class="stat-label">核心工具</span>
</div>
</div>
<div class="col-md-4">
<div class="stat-item">
<span class="stat-number">100%</span>
<span class="stat-label">可视化操作</span>
<span class="stat-number"></span>
<span class="stat-label">数据处理能力</span>
</div>
</div>
<div class="col-md-4">
<div class="stat-item">
<span class="stat-number">0</span>
<span class="stat-label">学习成本</span>
<span class="stat-number">24/7</span>
<span class="stat-label">稳定运行</span>
</div>
</div>
</div>
@@ -215,146 +218,79 @@
<div class="row justify-content-center">
<div class="col-lg-8">
<div class="text-center mb-5">
<h2 class="mb-3">可用工具</h2>
<p class="text-muted">选择适的工具来处理您的数据任务</p>
<h2 class="mb-3">核心工具模块</h2>
<p class="text-muted">选择适的工具来解决您的数据处理挑战</p>
</div>
</div>
</div>
<div class="row">
<!-- 数据库比对工具 -->
<!-- Cassandra数据库比对工具 -->
<div class="col-lg-6 col-md-12">
<div class="tool-card">
<div class="text-center">
<div class="feature-badge">可用</div>
<div class="feature-badge">生产就绪</div>
<div class="tool-icon">
<i class="fas fa-exchange-alt"></i>
<i class="fas fa-database"></i>
</div>
<h3 class="tool-title">数据库查询比对工具</h3>
<h3 class="tool-title">Cassandra数据比对工具</h3>
<p class="tool-description">
专业的Cassandra数据库比对工具,支持生产环境与测试环境数据差异分析,
提供批量查询、字段级比对和详细统计报告。
企业级Cassandra数据库比对平台,支持生产环境与测试环境数据差异分析,
提供批量查询、分表查询、多主键查询和详细统计报告。
</p>
</div>
<div class="tool-features">
<h5><i class="fas fa-star text-warning"></i> 核心功能</h5>
<h5><i class="fas fa-star text-warning"></i> 核心特性</h5>
<ul>
<li><i class="fas fa-check text-success"></i> 支持多环境数据库配置管理</li>
<li><i class="fas fa-check text-success"></i> 批量Key查询和数据比对</li>
<li><i class="fas fa-check text-success"></i> 自定义比较字段和排除字段</li>
<li><i class="fas fa-check text-success"></i> 可视化差异展示和统计</li>
<li><i class="fas fa-check text-success"></i> 配置和结果导出功能</li>
<li><i class="fas fa-check text-success"></i> 多环境数据库配置管理</li>
<li><i class="fas fa-check text-success"></i> 分表查询TWCS时间分表</li>
<li><i class="fas fa-check text-success"></i> 多主键查询和复合主键支持</li>
<li><i class="fas fa-check text-success"></i> 可视化差异展示和智能统计</li>
<li><i class="fas fa-check text-success"></i> 查询历史管理和结果导出</li>
<li><i class="fas fa-check text-success"></i> 详细的查询日志和性能监控</li>
</ul>
</div>
<div class="text-center">
<a href="/db-compare" class="tool-btn">
<a href="/cassandra-compare" class="tool-btn">
<i class="fas fa-rocket"></i> 立即使用
</a>
</div>
</div>
</div>
<!-- 占位工具卡片 -->
<!-- Redis集群比对工具 -->
<div class="col-lg-6 col-md-12">
<div class="tool-card">
<div class="text-center">
<div class="feature-badge coming-soon">即将推出</div>
<div class="feature-badge">生产就绪</div>
<div class="tool-icon">
<i class="fas fa-chart-line"></i>
<i class="fab fa-redis"></i>
</div>
<h3 class="tool-title">数据分析工具</h3>
<h3 class="tool-title">Redis集群比对工具</h3>
<p class="tool-description">
强大的数据分析和可视化工具,支持多种数据
提供丰富的图表类型和统计分析功能
专业的Redis集群数据比对平台,支持多种数据类型比对分析
提供随机采样、指定Key查询和全面的性能监控
</p>
</div>
<div class="tool-features">
<h5><i class="fas fa-star text-warning"></i> 计划功能</h5>
<h5><i class="fas fa-star text-warning"></i> 核心特性</h5>
<ul>
<li><i class="fas fa-clock text-muted"></i> 多数据源连接支持</li>
<li><i class="fas fa-clock text-muted"></i> 交互式图表生成</li>
<li><i class="fas fa-clock text-muted"></i> 自定义报表制作</li>
<li><i class="fas fa-clock text-muted"></i> 数据趋势分析</li>
<li><i class="fas fa-clock text-muted"></i> 自动化报告生成</li>
<li><i class="fas fa-check text-success"></i> Redis集群连接和配置管理</li>
<li><i class="fas fa-check text-success"></i> 智能随机采样和指定Key查询</li>
<li><i class="fas fa-check text-success"></i> 全数据类型支持String/Hash/List/Set/ZSet</li>
<li><i class="fas fa-check text-success"></i> 实时性能统计和详细报告</li>
<li><i class="fas fa-check text-success"></i> 批量操作和历史记录管理</li>
<li><i class="fas fa-check text-success"></i> 可视化数据展示和导出功能</li>
</ul>
</div>
<div class="text-center">
<button class="tool-btn" disabled style="opacity: 0.6;">
<i class="fas fa-hourglass-half"></i> 开发中
</button>
</div>
</div>
</div>
</div>
<!-- 第二行工具 -->
<div class="row mt-4">
<div class="col-lg-6 col-md-12">
<div class="tool-card">
<div class="text-center">
<div class="feature-badge coming-soon">即将推出</div>
<div class="tool-icon">
<i class="fas fa-file-import"></i>
</div>
<h3 class="tool-title">数据导入导出工具</h3>
<p class="tool-description">
高效的数据迁移工具,支持多种格式和数据库类型之间的数据传输,
提供批量处理和进度监控功能。
</p>
</div>
<div class="tool-features">
<h5><i class="fas fa-star text-warning"></i> 计划功能:</h5>
<ul>
<li><i class="fas fa-clock text-muted"></i> 多格式数据支持</li>
<li><i class="fas fa-clock text-muted"></i> 批量数据处理</li>
<li><i class="fas fa-clock text-muted"></i> 实时进度监控</li>
<li><i class="fas fa-clock text-muted"></i> 数据映射配置</li>
<li><i class="fas fa-clock text-muted"></i> 错误处理和日志</li>
</ul>
</div>
<div class="text-center">
<button class="tool-btn" disabled style="opacity: 0.6;">
<i class="fas fa-hourglass-half"></i> 开发中
</button>
</div>
</div>
</div>
<div class="col-lg-6 col-md-12">
<div class="tool-card">
<div class="text-center">
<div class="feature-badge coming-soon">即将推出</div>
<div class="tool-icon">
<i class="fas fa-shield-alt"></i>
</div>
<h3 class="tool-title">数据质量检测工具</h3>
<p class="tool-description">
专业的数据质量评估工具,自动检测数据完整性、一致性和准确性问题,
生成详细的质量报告和改进建议。
</p>
</div>
<div class="tool-features">
<h5><i class="fas fa-star text-warning"></i> 计划功能:</h5>
<ul>
<li><i class="fas fa-clock text-muted"></i> 数据完整性检查</li>
<li><i class="fas fa-clock text-muted"></i> 重复数据检测</li>
<li><i class="fas fa-clock text-muted"></i> 数据格式验证</li>
<li><i class="fas fa-clock text-muted"></i> 质量评分系统</li>
<li><i class="fas fa-clock text-muted"></i> 自动化修复建议</li>
</ul>
</div>
<div class="text-center">
<button class="tool-btn" disabled style="opacity: 0.6;">
<i class="fas fa-hourglass-half"></i> 开发中
</button>
<a href="/redis-compare" class="tool-btn">
<i class="fas fa-rocket"></i> 立即使用
</a>
</div>
</div>
</div>
@@ -365,10 +301,64 @@
<!-- 页脚 -->
<div class="footer">
<div class="container">
<p>&copy; 2024 大数据工具集合. 专注于提供高效的数据处理解决方案.</p>
<p>&copy; 2024 DataTools Pro. 企业级数据处理与比对解决方案.</p>
<p class="mb-0">
<small class="text-muted">
Version 2.0 | Powered by Flask & Bootstrap |
<i class="fas fa-heart text-danger"></i> Made with passion for data professionals
</small>
</p>
</div>
</div>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.bundle.min.js"></script>
<!-- 新的模块化JS结构 -->
<script type="module" src="/static/js/app-main.js"></script>
<!-- 页面特定脚本 -->
<script type="module">
// 首页特定的功能
import app from '/static/js/app-main.js';
// 首页初始化完成后的操作
document.addEventListener('DOMContentLoaded', function() {
// 为工具卡片添加悬停效果
document.querySelectorAll('.tool-card').forEach(card => {
card.addEventListener('mouseenter', function() {
this.style.transform = 'translateY(-5px)';
});
card.addEventListener('mouseleave', function() {
this.style.transform = 'translateY(0)';
});
});
// 统计数字动画效果
function animateNumbers() {
const stats = document.querySelectorAll('.stat-number');
stats.forEach(stat => {
const target = stat.textContent;
if (target === '∞' || target === '24/7') return;
const num = parseInt(target);
if (isNaN(num)) return;
let current = 0;
const increment = num / 20;
const timer = setInterval(() => {
current += increment;
if (current >= num) {
current = num;
clearInterval(timer);
}
stat.textContent = Math.floor(current);
}, 50);
});
}
// 页面加载后延迟执行动画
setTimeout(animateNumbers, 500);
});
</script>
</body>
</html>

View File

@@ -0,0 +1,966 @@
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Redis集群比对工具 - DataTools Pro</title>
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet">
<link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0/css/all.min.css" rel="stylesheet">
<style>
.config-section {
background-color: #f8f9fa;
border-radius: 8px;
padding: 20px;
margin-bottom: 20px;
}
.result-section {
margin-top: 30px;
}
.difference-item {
border-left: 4px solid #dc3545;
padding-left: 15px;
margin-bottom: 15px;
background-color: #fff5f5;
padding: 15px;
border-radius: 5px;
}
.identical-item {
border-left: 4px solid #28a745;
padding-left: 15px;
margin-bottom: 15px;
background-color: #f8fff8;
padding: 15px;
border-radius: 5px;
}
.missing-item {
border-left: 4px solid #ffc107;
padding-left: 15px;
margin-bottom: 15px;
background-color: #fffbf0;
padding: 15px;
border-radius: 5px;
}
.stat-card {
text-align: center;
padding: 20px;
border-radius: 10px;
margin-bottom: 20px;
}
.loading {
display: none;
}
.query-keys {
min-height: 120px;
}
.redis-value {
font-family: 'Courier New', monospace;
font-size: 0.9em;
background-color: #f8f9fa !important;
white-space: pre-wrap;
word-break: break-all;
}
.node-input {
display: flex;
align-items: center;
margin-bottom: 10px;
}
.node-input input {
margin-right: 10px;
}
.btn-redis {
background: linear-gradient(135deg, #dc143c 0%, #b91c1c 100%);
border: none;
color: white;
}
.btn-redis:hover {
background: linear-gradient(135deg, #b91c1c 0%, #991b1b 100%);
color: white;
transform: translateY(-1px);
}
.redis-logo {
color: #dc143c;
}
.cluster-config {
border: 2px solid #e9ecef;
border-radius: 10px;
padding: 15px;
margin-bottom: 15px;
}
.cluster-config.active {
border-color: #dc143c;
background-color: #fff8f8;
}
.log-viewer {
background-color: #1e1e1e;
color: #d4d4d4;
font-family: 'Courier New', monospace;
font-size: 0.9em;
padding: 15px;
border-radius: 5px;
max-height: 400px;
overflow-y: auto;
}
.navbar {
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%) !important;
}
.breadcrumb {
background: none;
padding: 0;
}
.breadcrumb-item a {
color: #007bff;
text-decoration: none;
}
.pagination {
--bs-pagination-padding-x: 0.5rem;
--bs-pagination-padding-y: 0.25rem;
--bs-pagination-font-size: 0.875rem;
}
/* 确保提示消息在最顶层 */
.alert {
position: fixed !important;
top: 20px !important;
left: 50% !important;
transform: translateX(-50%) !important;
z-index: 9999 !important;
min-width: 300px !important;
max-width: 600px !important;
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.3) !important;
}
/* 拖拽区域样式 */
#dropZone {
transition: all 0.3s ease;
}
#dropZone:hover {
border-color: #dc143c !important;
background-color: #fff8f8;
}
#dropZone.dragover {
border-color: #dc143c !important;
background-color: #fff8f8;
transform: scale(1.02);
}
/* 配置预览样式 */
.config-preview-item {
display: flex;
justify-content: space-between;
align-items: center;
padding: 5px 0;
border-bottom: 1px solid #eee;
}
.config-preview-item:last-child {
border-bottom: none;
}
.config-preview-label {
font-weight: 600;
color: #495057;
}
.config-preview-value {
color: #6c757d;
font-family: 'Courier New', monospace;
font-size: 0.9em;
}
/* Redis日志分组样式 */
.redis-log-group {
border: 1px solid #dee2e6;
border-radius: 8px;
overflow: hidden;
}
.redis-log-group .card-header {
transition: background-color 0.2s ease;
}
.redis-log-group .card-header:hover {
background-color: rgba(0, 123, 255, 0.1) !important;
}
.redis-log-content {
max-height: 400px;
overflow-y: auto;
}
.log-item {
background-color: #f8f9fa;
border-radius: 4px;
transition: all 0.2s ease;
}
.log-item:hover {
background-color: #e9ecef;
transform: translateX(2px);
}
.log-item .badge {
min-width: 60px;
font-size: 0.75em;
}
/* 日志级别边框颜色 */
.border-info {
border-color: #0dcaf0 !important;
}
.border-warning {
border-color: #ffc107 !important;
}
.border-danger {
border-color: #dc3545 !important;
}
.border-secondary {
border-color: #6c757d !important;
}
</style>
</head>
<body>
<!-- 导航栏 -->
<nav class="navbar navbar-expand-lg navbar-dark">
<div class="container">
<a class="navbar-brand" href="/">
<i class="fab fa-redis me-2 redis-logo"></i>
DataTools Pro
</a>
<div class="navbar-nav ms-auto">
<a class="nav-link" href="/">
<i class="fas fa-home"></i> 首页
</a>
<a class="nav-link" href="/cassandra-compare">
<i class="fas fa-database"></i> Cassandra比对
</a>
<a class="nav-link active" href="/redis-compare">
<i class="fab fa-redis"></i> Redis比对
</a>
</div>
</div>
</nav>
<div class="container-fluid mt-4">
<!-- 面包屑导航 -->
<nav aria-label="breadcrumb">
<ol class="breadcrumb">
<li class="breadcrumb-item"><a href="/">首页</a></li>
<li class="breadcrumb-item active" aria-current="page">Redis集群比对工具</li>
</ol>
</nav>
<!-- 页面标题 -->
<div class="row">
<div class="col-12">
<h1 class="text-center mb-4">
<i class="fab fa-redis redis-logo"></i> Redis集群比对工具
<small class="text-muted d-block fs-6 mt-2">专业的Redis集群数据比对工具支持随机采样和指定Key查询</small>
</h1>
</div>
</div>
<div class="row">
<!-- 配置面板 -->
<div class="col-lg-4">
<div class="config-section">
<h4><i class="fas fa-cogs"></i> 配置管理</h4>
<!-- 配置组管理 -->
<div class="card mb-3">
<div class="card-header">
<h6><i class="fas fa-layer-group"></i> 配置组管理</h6>
</div>
<div class="card-body">
<div class="row mb-3">
<div class="col-8">
<select class="form-select form-select-sm" id="redisConfigGroupSelect">
<option value="">选择Redis配置组...</option>
</select>
</div>
<div class="col-4">
<button class="btn btn-primary btn-sm w-100" onclick="loadSelectedRedisConfigGroup()">
<i class="fas fa-download"></i> 加载
</button>
</div>
</div>
<div class="row">
<div class="col-6">
<button class="btn btn-success btn-sm w-100" onclick="showSaveRedisConfigDialog()">
<i class="fas fa-save"></i> 保存配置组
</button>
</div>
<div class="col-6">
<button class="btn btn-info btn-sm w-100" onclick="showManageRedisConfigDialog()">
<i class="fas fa-cog"></i> 管理配置组
</button>
</div>
</div>
<div class="row mt-2">
<div class="col-4">
<button class="btn btn-warning btn-sm w-100" onclick="showRedisQueryHistoryDialog()">
<i class="fas fa-history"></i> 查询历史
</button>
</div>
<div class="col-4">
<button class="btn btn-info btn-sm w-100" onclick="showRedisQueryLogsDialog()">
<i class="fas fa-file-alt"></i> 查询日志
</button>
</div>
<div class="col-4">
<button class="btn btn-secondary btn-sm w-100" onclick="showSaveRedisHistoryDialog()">
<i class="fas fa-bookmark"></i> 保存历史
</button>
</div>
</div>
</div>
</div>
<!-- Redis集群配置 -->
<div class="card mb-3">
<div class="card-header">
<h6><i class="fas fa-server"></i> Redis集群配置</h6>
</div>
<div class="card-body">
<!-- 集群1配置 -->
<div class="cluster-config mb-3">
<div class="d-flex justify-content-between align-items-center mb-2">
<h6 class="text-primary mb-0"><i class="fas fa-server"></i> 集群1 (生产)</h6>
<button type="button" class="btn btn-outline-primary btn-sm" onclick="showImportConfigDialog('cluster1')" title="导入YAML配置">
<i class="fas fa-file-import"></i> 导入配置
</button>
</div>
<div class="row mb-2">
<div class="col-12">
<label class="form-label">集群名称</label>
<input type="text" class="form-control form-control-sm" id="cluster1Name" placeholder="生产集群" value="生产集群">
</div>
</div>
<div class="mb-2">
<label class="form-label">Redis节点</label>
<div id="cluster1Nodes">
<div class="node-input">
<input type="text" class="form-control form-control-sm me-2" placeholder="127.0.0.1" value="127.0.0.1" style="flex: 2;">
<input type="number" class="form-control form-control-sm me-2" placeholder="6379" value="6379" style="flex: 1;">
<button type="button" class="btn btn-outline-danger btn-sm" onclick="removeRedisNode(this, 'cluster1')">
<i class="fas fa-minus"></i>
</button>
</div>
</div>
<button type="button" class="btn btn-outline-primary btn-sm mt-2" onclick="addRedisNode('cluster1')">
<i class="fas fa-plus"></i> 添加节点
</button>
</div>
<div class="row mb-2">
<div class="col-12">
<label class="form-label">密码</label>
<input type="password" class="form-control form-control-sm" id="cluster1Password" placeholder="可选">
</div>
</div>
<div class="row mb-2">
<div class="col-4">
<label class="form-label">连接超时(秒)</label>
<input type="number" class="form-control form-control-sm" id="cluster1SocketTimeout" value="3" min="1" max="60">
</div>
<div class="col-4">
<label class="form-label">建立超时(秒)</label>
<input type="number" class="form-control form-control-sm" id="cluster1SocketConnectTimeout" value="3" min="1" max="60">
</div>
<div class="col-4">
<label class="form-label">最大连接数</label>
<input type="number" class="form-control form-control-sm" id="cluster1MaxConnectionsPerNode" value="16" min="1" max="100">
</div>
</div>
<div class="row">
<div class="col-12">
<button class="btn btn-outline-primary btn-sm w-100" onclick="testConnection('cluster1')">
<i class="fas fa-plug"></i> 测试连接
</button>
</div>
</div>
</div>
<!-- 集群2配置 -->
<div class="cluster-config">
<div class="d-flex justify-content-between align-items-center mb-2">
<h6 class="text-success mb-0"><i class="fas fa-server"></i> 集群2 (测试)</h6>
<button type="button" class="btn btn-outline-success btn-sm" onclick="showImportConfigDialog('cluster2')" title="导入YAML配置">
<i class="fas fa-file-import"></i> 导入配置
</button>
</div>
<div class="row mb-2">
<div class="col-12">
<label class="form-label">集群名称</label>
<input type="text" class="form-control form-control-sm" id="cluster2Name" placeholder="测试集群" value="测试集群">
</div>
</div>
<div class="mb-2">
<label class="form-label">Redis节点</label>
<div id="cluster2Nodes">
<div class="node-input">
<input type="text" class="form-control form-control-sm me-2" placeholder="127.0.0.1" value="127.0.0.1" style="flex: 2;">
<input type="number" class="form-control form-control-sm me-2" placeholder="6380" value="6380" style="flex: 1;">
<button type="button" class="btn btn-outline-danger btn-sm" onclick="removeRedisNode(this, 'cluster2')">
<i class="fas fa-minus"></i>
</button>
</div>
</div>
<button type="button" class="btn btn-outline-primary btn-sm mt-2" onclick="addRedisNode('cluster2')">
<i class="fas fa-plus"></i> 添加节点
</button>
</div>
<div class="row mb-2">
<div class="col-12">
<label class="form-label">密码</label>
<input type="password" class="form-control form-control-sm" id="cluster2Password" placeholder="可选">
</div>
</div>
<div class="row mb-2">
<div class="col-4">
<label class="form-label">连接超时(秒)</label>
<input type="number" class="form-control form-control-sm" id="cluster2SocketTimeout" value="3" min="1" max="60">
</div>
<div class="col-4">
<label class="form-label">建立超时(秒)</label>
<input type="number" class="form-control form-control-sm" id="cluster2SocketConnectTimeout" value="3" min="1" max="60">
</div>
<div class="col-4">
<label class="form-label">最大连接数</label>
<input type="number" class="form-control form-control-sm" id="cluster2MaxConnectionsPerNode" value="16" min="1" max="100">
</div>
</div>
<div class="row">
<div class="col-12">
<button class="btn btn-outline-primary btn-sm w-100" onclick="testConnection('cluster2')">
<i class="fas fa-plug"></i> 测试连接
</button>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<!-- 查询面板和结果展示 -->
<div class="col-lg-8">
<div class="config-section">
<h4><i class="fas fa-search"></i> 查询配置</h4>
<!-- 查询模式选择 -->
<div class="card mb-3">
<div class="card-header">
<h6><i class="fas fa-sliders-h"></i> 查询模式</h6>
</div>
<div class="card-body">
<div class="row">
<div class="col-md-6">
<div class="form-check">
<input class="form-check-input" type="radio" name="queryMode" id="randomMode" value="random" checked onchange="toggleQueryMode()">
<label class="form-check-label" for="randomMode">
<strong>随机采样模式</strong>
</label>
</div>
<div id="randomOptions" class="mt-2">
<div class="row">
<div class="col-6">
<label class="form-label">采样数量</label>
<input type="number" class="form-control form-control-sm" id="sampleCount" value="100" min="1" max="10000">
</div>
<div class="col-6">
<label class="form-label">Key模式</label>
<input type="text" class="form-control form-control-sm" id="keyPattern" value="*" placeholder="*">
</div>
</div>
<div class="mt-2">
<label class="form-label">源集群</label>
<select class="form-select form-select-sm" id="sourceCluster">
<option value="cluster1">从集群1获取Key</option>
<option value="cluster2" selected>从集群2获取Key</option>
</select>
</div>
</div>
</div>
<div class="col-md-6">
<div class="form-check">
<input class="form-check-input" type="radio" name="queryMode" id="specifiedMode" value="specified" onchange="toggleQueryMode()">
<label class="form-check-label" for="specifiedMode">
<strong>指定Key模式</strong>
</label>
</div>
<div id="specifiedOptions" class="mt-2" style="display: none;">
<label class="form-label">Key列表 (每行一个)</label>
<textarea class="form-control query-keys" id="specifiedKeys" rows="6" placeholder="输入要查询的Key每行一个&#10;例如:&#10;user:example1&#10;user:example2&#10;session:abc123"></textarea>
<small class="form-text text-muted">支持大批量Key查询建议单次不超过1000个</small>
</div>
</div>
</div>
</div>
</div>
<!-- 执行按钮 -->
<div class="text-center mb-4">
<button type="button" class="btn btn-redis btn-lg me-3" onclick="executeRedisComparison()">
<i class="fas fa-play me-2"></i>开始Redis数据比较
</button>
<button type="button" class="btn btn-outline-secondary btn-lg" onclick="clearResults()">
<i class="fas fa-eraser me-2"></i>清空结果
</button>
</div>
</div>
<!-- 结果展示区域 -->
<div class="result-section" id="resultSection" style="display: none;">
<!-- 统计信息 -->
<div class="row" id="stats">
<!-- 统计卡片将在这里动态生成 -->
</div>
<!-- 结果选项卡导航 -->
<div class="card mt-4">
<div class="card-header">
<ul class="nav nav-tabs card-header-tabs" id="resultTabs" role="tablist">
<li class="nav-item" role="presentation">
<button class="nav-link active" id="differences-tab" data-bs-toggle="tab" data-bs-target="#differences-panel" type="button" role="tab">
<i class="fas fa-exclamation-triangle"></i> 差异详情 <span class="badge bg-danger ms-1" id="diff-count">0</span>
</button>
</li>
<li class="nav-item" role="presentation">
<button class="nav-link" id="identical-tab" data-bs-toggle="tab" data-bs-target="#identical-panel" type="button" role="tab">
<i class="fas fa-check-circle"></i> 相同结果 <span class="badge bg-success ms-1" id="identical-count">0</span>
</button>
</li>
<li class="nav-item" role="presentation">
<button class="nav-link" id="missing-tab" data-bs-toggle="tab" data-bs-target="#missing-panel" type="button" role="tab">
<i class="fas fa-question-circle"></i> 缺失数据 <span class="badge bg-warning ms-1" id="missing-count">0</span>
</button>
</li>
<li class="nav-item" role="presentation">
<button class="nav-link" id="raw-data-tab" data-bs-toggle="tab" data-bs-target="#raw-data-panel" type="button" role="tab">
<i class="fas fa-database"></i> 原生数据
</button>
</li>
<li class="nav-item" role="presentation">
<button class="nav-link" id="summary-tab" data-bs-toggle="tab" data-bs-target="#summary-panel" type="button" role="tab">
<i class="fas fa-chart-pie"></i> 比较总结
</button>
</li>
</ul>
</div>
<div class="card-body">
<div class="tab-content" id="resultTabContent">
<!-- 差异详情面板 -->
<div class="tab-pane fade show active" id="differences-panel" role="tabpanel">
<div id="differences-content">
<!-- 差异内容将在这里动态生成 -->
</div>
<div id="differences-pagination" class="d-flex justify-content-center mt-3">
<!-- 分页将在这里动态生成 -->
</div>
</div>
<!-- 相同结果面板 -->
<div class="tab-pane fade" id="identical-panel" role="tabpanel">
<div id="identical-content">
<!-- 相同结果内容将在这里动态生成 -->
</div>
<div id="identical-pagination" class="d-flex justify-content-center mt-3">
<!-- 分页将在这里动态生成 -->
</div>
</div>
<!-- 缺失数据面板 -->
<div class="tab-pane fade" id="missing-panel" role="tabpanel">
<div id="missing-content">
<!-- 缺失数据内容将在这里动态生成 -->
</div>
<div id="missing-pagination" class="d-flex justify-content-center mt-3">
<!-- 分页将在这里动态生成 -->
</div>
</div>
<!-- 原生数据面板 -->
<div class="tab-pane fade" id="raw-data-panel" role="tabpanel">
<div class="row">
<div class="col-md-6">
<h5><i class="fas fa-server text-primary"></i> 集群1 原生数据</h5>
<div class="mb-3">
<div class="btn-group btn-group-sm" role="group">
<button type="button" class="btn btn-outline-primary active" onclick="switchRawDataView('cluster1', 'formatted')">格式化</button>
<button type="button" class="btn btn-outline-primary" onclick="switchRawDataView('cluster1', 'raw')">原始</button>
<button type="button" class="btn btn-outline-primary" onclick="exportRawData('cluster1')">导出</button>
</div>
</div>
<div id="cluster1-raw-data" class="redis-value" style="max-height: 500px; overflow-y: auto;">
<!-- 集群1原生数据将在这里显示 -->
</div>
</div>
<div class="col-md-6">
<h5><i class="fas fa-server text-success"></i> 集群2 原生数据</h5>
<div class="mb-3">
<div class="btn-group btn-group-sm" role="group">
<button type="button" class="btn btn-outline-success active" onclick="switchRawDataView('cluster2', 'formatted')">格式化</button>
<button type="button" class="btn btn-outline-success" onclick="switchRawDataView('cluster2', 'raw')">原始</button>
<button type="button" class="btn btn-outline-success" onclick="exportRawData('cluster2')">导出</button>
</div>
</div>
<div id="cluster2-raw-data" class="redis-value" style="max-height: 500px; overflow-y: auto;">
<!-- 集群2原生数据将在这里显示 -->
</div>
</div>
</div>
<div class="row mt-4">
<div class="col-12">
<div class="alert alert-info">
<i class="fas fa-info-circle"></i>
<strong>提示:</strong>原生数据显示实际从Redis查询到的数据。格式化视图便于阅读原始视图保持数据原始格式。
</div>
</div>
</div>
</div>
<!-- 比较总结面板 -->
<div class="tab-pane fade" id="summary-panel" role="tabpanel">
<div id="performanceReport">
<!-- 性能报告将在这里动态生成 -->
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<!-- 加载提示 -->
<div class="loading" id="loadingIndicator">
<div class="position-fixed top-50 start-50 translate-middle bg-dark text-white p-4 rounded shadow" style="z-index: 9999;">
<div class="d-flex align-items-center">
<div class="spinner-border spinner-border-sm me-3" role="status"></div>
<span id="loadingText">正在执行Redis数据比较请稍候...</span>
</div>
</div>
</div>
<!-- 配置组保存对话框 -->
<div class="modal fade" id="saveRedisConfigModal" tabindex="-1" aria-hidden="true">
<div class="modal-dialog">
<div class="modal-content">
<div class="modal-header">
<h5 class="modal-title">保存Redis配置组</h5>
<button type="button" class="btn-close" data-bs-dismiss="modal"></button>
</div>
<div class="modal-body">
<div class="mb-3">
<label for="redisConfigName" class="form-label">配置组名称</label>
<input type="text" class="form-control" id="redisConfigName" placeholder="输入配置组名称">
</div>
<div class="mb-3">
<label for="redisConfigDescription" class="form-label">描述</label>
<textarea class="form-control" id="redisConfigDescription" rows="3" placeholder="配置组描述(可选)"></textarea>
</div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" data-bs-dismiss="modal">取消</button>
<button type="button" class="btn btn-success" onclick="saveRedisConfigGroup()">保存</button>
</div>
</div>
</div>
</div>
<!-- 导入配置模态框 -->
<div class="modal fade" id="importConfigModal" tabindex="-1" aria-labelledby="importConfigModalLabel" aria-hidden="true">
<div class="modal-dialog modal-lg">
<div class="modal-content">
<div class="modal-header">
<h5 class="modal-title" id="importConfigModalLabel">
<i class="fas fa-file-import redis-logo"></i> 导入Redis配置
</h5>
<button type="button" class="btn-close" data-bs-dismiss="modal" aria-label="Close"></button>
</div>
<div class="modal-body">
<div class="mb-3">
<h6 class="mb-2">导入方式</h6>
<div class="btn-group w-100" role="group">
<input type="radio" class="btn-check" name="importMethod" id="importMethodText" checked>
<label class="btn btn-outline-primary" for="importMethodText">
<i class="fas fa-keyboard"></i> 文本粘贴
</label>
<input type="radio" class="btn-check" name="importMethod" id="importMethodFile">
<label class="btn btn-outline-primary" for="importMethodFile">
<i class="fas fa-file-upload"></i> 文件上传
</label>
</div>
</div>
<!-- 文本粘贴方式 -->
<div id="textImportSection">
<div class="mb-3">
<label for="configYamlText" class="form-label">YAML配置内容</label>
<textarea class="form-control" id="configYamlText" rows="8" placeholder="请粘贴YAML格式的配置内容例如&#10;clusterName: &quot;redis-example&quot;&#10;clusterAddress: &quot;127.0.0.1:6379&quot;&#10;clusterPassword: &quot;&quot;"></textarea>
</div>
</div>
<!-- 文件上传方式 -->
<div id="fileImportSection" style="display: none;">
<div class="mb-3">
<label for="configYamlFile" class="form-label">选择YAML配置文件</label>
<input type="file" class="form-control" id="configYamlFile" accept=".yml,.yaml,.txt">
</div>
<div class="border rounded p-3 text-center" id="dropZone" style="border-style: dashed !important; min-height: 100px; cursor: pointer;">
<i class="fas fa-cloud-upload-alt fa-2x text-muted mb-2"></i>
<p class="text-muted mb-0">点击选择文件或拖拽文件到此处</p>
</div>
</div>
<!-- 配置预览 -->
<div id="configPreview" style="display: none;">
<hr>
<h6><i class="fas fa-eye"></i> 配置预览</h6>
<div class="alert alert-info">
<div id="previewContent"></div>
</div>
</div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-outline-secondary me-2" onclick="previewConfig()">
<i class="fas fa-eye"></i> 预览配置
</button>
<button type="button" class="btn btn-secondary" data-bs-dismiss="modal">取消</button>
<button type="button" class="btn btn-redis" onclick="importConfig()">
<i class="fas fa-file-import"></i> 导入
</button>
</div>
</div>
</div>
</div>
<!-- Redis查询历史模态框 -->
<div class="modal fade" id="redisQueryHistoryModal" tabindex="-1" aria-labelledby="redisQueryHistoryModalLabel" aria-hidden="true">
<div class="modal-dialog modal-xl">
<div class="modal-content">
<div class="modal-header">
<h5 class="modal-title" id="redisQueryHistoryModalLabel">
<i class="fas fa-history redis-logo"></i> Redis查询历史管理
</h5>
<button type="button" class="btn-close" data-bs-dismiss="modal" aria-label="Close"></button>
</div>
<div class="modal-body">
<div class="d-flex justify-content-between align-items-center mb-3">
<h6 class="mb-0">历史查询记录</h6>
<div>
<button class="btn btn-sm btn-outline-primary me-2" onclick="refreshRedisQueryHistory()">
<i class="fas fa-sync-alt"></i> 刷新
</button>
<button class="btn btn-sm btn-outline-danger" onclick="clearAllRedisHistory()">
<i class="fas fa-trash"></i> 清空全部
</button>
</div>
</div>
<div id="redisHistoryList" style="max-height: 600px; overflow-y: auto;">
<!-- Redis历史记录列表将在这里动态生成 -->
</div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" data-bs-dismiss="modal">关闭</button>
</div>
</div>
</div>
</div>
<!-- Redis查询日志模态框 -->
<div class="modal fade" id="redisQueryLogsModal" tabindex="-1" aria-labelledby="redisQueryLogsModalLabel" aria-hidden="true">
<div class="modal-dialog modal-xl">
<div class="modal-content">
<div class="modal-header">
<h5 class="modal-title" id="redisQueryLogsModalLabel">
<i class="fas fa-file-alt redis-logo"></i> Redis查询日志管理
</h5>
<button type="button" class="btn-close" data-bs-dismiss="modal" aria-label="Close"></button>
</div>
<div class="modal-body">
<div class="d-flex justify-content-between align-items-center mb-3">
<div class="d-flex align-items-center">
<h6 class="mb-0 me-3">Redis查询执行日志</h6>
<div class="form-check form-check-inline">
<input class="form-check-input" type="checkbox" id="redis-modal-log-level-info" checked onchange="filterRedisModalLogsByLevel()">
<label class="form-check-label text-primary" for="redis-modal-log-level-info">INFO</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="checkbox" id="redis-modal-log-level-warning" checked onchange="filterRedisModalLogsByLevel()">
<label class="form-check-label text-warning" for="redis-modal-log-level-warning">WARNING</label>
</div>
<div class="form-check form-check-inline">
<input class="form-check-input" type="checkbox" id="redis-modal-log-level-error" checked onchange="filterRedisModalLogsByLevel()">
<label class="form-check-label text-danger" for="redis-modal-log-level-error">ERROR</label>
</div>
</div>
<div>
<button class="btn btn-sm btn-outline-primary me-2" onclick="refreshRedisQueryLogs()">
<i class="fas fa-sync-alt"></i> 刷新
</button>
<button class="btn btn-sm btn-outline-danger" onclick="clearRedisQueryLogs()">
<i class="fas fa-trash"></i> 清空
</button>
</div>
</div>
<div id="redis-modal-query-logs" style="max-height: 600px; overflow-y: auto;">
<!-- Redis查询日志将在这里动态生成 -->
</div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" data-bs-dismiss="modal">关闭</button>
</div>
</div>
</div>
</div>
<!-- 导入配置对话框 -->
<div class="modal fade" id="importRedisConfigModal" tabindex="-1" aria-hidden="true">
<div class="modal-dialog modal-lg">
<div class="modal-content">
<div class="modal-header">
<h5 class="modal-title">导入Redis配置</h5>
<button type="button" class="btn-close" data-bs-dismiss="modal"></button>
</div>
<div class="modal-body">
<div class="mb-3">
<label for="configFormat" class="form-label">配置格式</label>
<select class="form-select" id="configFormat" onchange="updateConfigTemplate()">
<option value="yaml">YAML格式</option>
<option value="json">JSON格式</option>
</select>
</div>
<div class="mb-3">
<label for="configContent" class="form-label">配置内容</label>
<textarea class="form-control" id="configContent" rows="15" placeholder="请粘贴配置内容..."></textarea>
<small class="form-text text-muted">支持YAML和JSON格式的Redis配置</small>
</div>
<div class="mb-3">
<div class="accordion" id="configTemplateAccordion">
<div class="accordion-item">
<h2 class="accordion-header" id="templateHeading">
<button class="accordion-button collapsed" type="button" data-bs-toggle="collapse" data-bs-target="#templateCollapse">
<i class="fas fa-code me-2"></i>配置示例
</button>
</h2>
<div id="templateCollapse" class="accordion-collapse collapse" data-bs-parent="#configTemplateAccordion">
<div class="accordion-body">
<pre id="configTemplate" class="bg-light p-3 rounded" style="font-size: 0.85em;"></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" data-bs-dismiss="modal">取消</button>
<button type="button" class="btn btn-primary" onclick="importRedisConfig()">导入配置</button>
</div>
</div>
</div>
</div>
<!-- 保存Redis配置组模态框 -->
<div class="modal fade" id="saveRedisConfigModal" tabindex="-1" aria-labelledby="saveRedisConfigModalLabel" aria-hidden="true">
<div class="modal-dialog">
<div class="modal-content">
<div class="modal-header">
<h5 class="modal-title" id="saveRedisConfigModalLabel">
<i class="fas fa-save redis-logo"></i> 保存Redis配置组
</h5>
<button type="button" class="btn-close" data-bs-dismiss="modal" aria-label="Close"></button>
</div>
<div class="modal-body">
<form id="saveRedisConfigForm">
<div class="mb-3">
<label for="redisConfigName" class="form-label">配置组名称 <span class="text-danger">*</span></label>
<input type="text" class="form-control" id="redisConfigName" required>
</div>
<div class="mb-3">
<label for="redisConfigDescription" class="form-label">配置描述</label>
<textarea class="form-control" id="redisConfigDescription" rows="3" placeholder="请输入配置组的详细描述..."></textarea>
</div>
</form>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" data-bs-dismiss="modal">取消</button>
<button type="button" class="btn btn-redis" onclick="saveRedisConfigGroup()">
<i class="fas fa-save"></i> 保存
</button>
</div>
</div>
</div>
</div>
<!-- 管理Redis配置组模态框 -->
<div class="modal fade" id="manageRedisConfigModal" tabindex="-1" aria-labelledby="manageRedisConfigModalLabel" aria-hidden="true">
<div class="modal-dialog modal-xl">
<div class="modal-content">
<div class="modal-header">
<h5 class="modal-title" id="manageRedisConfigModalLabel">
<i class="fas fa-cog redis-logo"></i> 管理Redis配置组
</h5>
<button type="button" class="btn-close" data-bs-dismiss="modal" aria-label="Close"></button>
</div>
<div class="modal-body">
<div class="d-flex justify-content-between align-items-center mb-3">
<h6 class="mb-0">已保存的配置组</h6>
<button class="btn btn-sm btn-outline-primary" onclick="refreshRedisConfigGroups()">
<i class="fas fa-sync-alt"></i> 刷新
</button>
</div>
<div id="redisConfigGroupsList" style="max-height: 600px; overflow-y: auto;">
<!-- Redis配置组列表将在这里动态生成 -->
</div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" data-bs-dismiss="modal">关闭</button>
</div>
</div>
</div>
</div>
<!-- 保存Redis查询历史模态框 -->
<div class="modal fade" id="saveRedisHistoryModal" tabindex="-1" aria-labelledby="saveRedisHistoryModalLabel" aria-hidden="true">
<div class="modal-dialog">
<div class="modal-content">
<div class="modal-header">
<h5 class="modal-title" id="saveRedisHistoryModalLabel">
<i class="fas fa-bookmark redis-logo"></i> 保存Redis查询历史
</h5>
<button type="button" class="btn-close" data-bs-dismiss="modal" aria-label="Close"></button>
</div>
<div class="modal-body">
<form id="saveRedisHistoryForm">
<div class="mb-3">
<label for="redisHistoryName" class="form-label">历史记录名称 <span class="text-danger">*</span></label>
<input type="text" class="form-control" id="redisHistoryName" required>
</div>
<div class="mb-3">
<label for="redisHistoryDescription" class="form-label">历史描述</label>
<textarea class="form-control" id="redisHistoryDescription" rows="3" placeholder="请输入查询历史的详细描述..."></textarea>
</div>
</form>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" data-bs-dismiss="modal">取消</button>
<button type="button" class="btn btn-redis" onclick="saveRedisQueryHistory()">
<i class="fas fa-bookmark"></i> 保存
</button>
</div>
</div>
</div>
</div>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.bundle.min.js"></script>
<script src="/static/js/redis_compare.js"></script>
</body>
</html>