3448 lines
		
	
	
		
			97 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			3448 lines
		
	
	
		
			97 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # 华为应用市场爬虫系统开发文档
 | ||
| 
 | ||
| > 基于原 Rust 项目的 Python + MySQL + Vue3 重构指南
 | ||
| 
 | ||
| ## 📋 目录
 | ||
| 
 | ||
| - [1. 项目概述](#1-项目概述)
 | ||
| - [2. 系统架构](#2-系统架构)
 | ||
| - [3. 数据源分析](#3-数据源分析)
 | ||
| - [4. 数据库设计](#4-数据库设计)
 | ||
| - [5. 后端开发](#5-后端开发)
 | ||
| - [6. 前端开发](#6-前端开发)
 | ||
| - [7. 部署指南](#7-部署指南)
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 1. 项目概述
 | ||
| 
 | ||
| ### 1.1 项目目标
 | ||
| 
 | ||
| 开发一个华为应用市场(AppGallery)数据采集与可视化系统,实现:
 | ||
| - 自动爬取华为应用市场的应用信息
 | ||
| - 存储应用的基本信息、版本历史、下载量、评分等数据
 | ||
| - 提供 Web 界面展示数据统计、排行榜、趋势分析
 | ||
| - 支持用户搜索、筛选、投稿应用
 | ||
| 
 | ||
| ### 1.2 技术栈选型
 | ||
| 
 | ||
| **后端:**
 | ||
| - Python 3.10+
 | ||
| - FastAPI (Web 框架)
 | ||
| - SQLAlchemy (ORM)
 | ||
| - MySQL 8.0+
 | ||
| - APScheduler (定时任务)
 | ||
| - httpx / aiohttp (异步 HTTP 客户端)
 | ||
| 
 | ||
| **前端:**
 | ||
| - Vue 3 + TypeScript
 | ||
| - Vite (构建工具)
 | ||
| - Element Plus / Ant Design Vue (UI 组件库)
 | ||
| - ECharts / Chart.js (图表库)
 | ||
| - Axios (HTTP 客户端)
 | ||
| - Pinia (状态管理)
 | ||
| 
 | ||
| **部署:**
 | ||
| - Docker + Docker Compose
 | ||
| - Nginx (反向代理)
 | ||
| - Gunicorn / Uvicorn (ASGI 服务器)
 | ||
| 
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 2. 系统架构
 | ||
| 
 | ||
| ### 2.1 整体架构图
 | ||
| 
 | ||
| ```
 | ||
| ┌─────────────────────────────────────────────────────────────┐
 | ||
| │                         用户浏览器                            │
 | ||
| └────────────────────────┬────────────────────────────────────┘
 | ||
|                          │ HTTP/HTTPS
 | ||
|                          ▼
 | ||
| ┌─────────────────────────────────────────────────────────────┐
 | ||
| │                    Nginx (反向代理)                          │
 | ||
| └──────────┬──────────────────────────────────┬───────────────┘
 | ||
|            │                                  │
 | ||
|            │ /api/*                          │ /*
 | ||
|            ▼                                  ▼
 | ||
| ┌──────────────────────┐          ┌──────────────────────────┐
 | ||
| │   FastAPI 后端服务    │          │   Vue3 前端静态资源       │
 | ||
| │  - REST API          │          │  - SPA 应用              │
 | ||
| │  - 数据查询          │          │  - 数据可视化            │
 | ||
| │  - 爬虫调度          │          └──────────────────────────┘
 | ||
| └──────────┬───────────┘
 | ||
|            │
 | ||
|            ▼
 | ||
| ┌──────────────────────┐          ┌──────────────────────────┐
 | ||
| │   MySQL 数据库        │◄─────────│   爬虫调度器              │
 | ||
| │  - 应用信息          │          │  - APScheduler           │
 | ||
| │  - 历史数据          │          │  - 定时同步              │
 | ||
| │  - 统计数据          │          │  - 批量处理              │
 | ||
| └──────────────────────┘          └──────────┬───────────────┘
 | ||
|                                              │
 | ||
|                                              ▼
 | ||
|                                   ┌──────────────────────────┐
 | ||
|                                   │  华为应用市场 API         │
 | ||
|                                   │  - 应用信息接口           │
 | ||
|                                   │  - 评分详情接口           │
 | ||
|                                   └──────────────────────────┘
 | ||
| ```
 | ||
| 
 | ||
| ### 2.2 核心模块
 | ||
| 
 | ||
| 1. **爬虫模块** - 负责从华为 API 获取数据
 | ||
| 2. **数据处理模块** - 数据清洗、去重、入库
 | ||
| 3. **API 服务模块** - 提供 RESTful API
 | ||
| 4. **调度模块** - 定时任务和批量处理
 | ||
| 5. **前端展示模块** - 数据可视化和交互
 | ||
| 
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 3. 数据源分析
 | ||
| 
 | ||
| ### 3.1 华为应用市场 API
 | ||
| 
 | ||
| **基础信息:**
 | ||
| - API Base URL: `https://web-drcn.hispace.dbankcloud.com/edge`
 | ||
| - 需要动态获取认证 Token(interface-code 和 identity-id)
 | ||
| - Token 有效期约 10 分钟,需定期刷新
 | ||
| 
 | ||
| ### 3.2 主要接口
 | ||
| 
 | ||
| #### 3.2.1 获取应用基本信息
 | ||
| 
 | ||
| **接口地址:** `POST /webedge/appinfo`
 | ||
| 
 | ||
| **请求头:**
 | ||
| ```http
 | ||
| Content-Type: application/json
 | ||
| User-Agent: HuaweiMarketCrawler/1.0
 | ||
| interface-code: {动态获取的token}
 | ||
| identity-id: {动态获取的token}
 | ||
| ```
 | ||
| 
 | ||
| **请求体(按包名查询):**
 | ||
| ```json
 | ||
| {
 | ||
|   "pkgName": "com.huawei.hmsapp.appgallery",
 | ||
|   "locale": "zh_CN"
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| **请求体(按应用ID查询):**
 | ||
| ```json
 | ||
| {
 | ||
|   "appId": "C1164531384803416384",
 | ||
|   "locale": "zh_CN"
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| **响应示例:**
 | ||
| ```json
 | ||
| {
 | ||
|   "appId": "C1164531384803416384",
 | ||
|   "name": "应用市场",
 | ||
|   "pkgName": "com.huawei.hmsapp.appgallery",
 | ||
|   "devId": "260086000000068459",
 | ||
|   "developerName": "华为软件技术有限公司",
 | ||
|   "devEnName": "Huawei Software Technologies Co., Ltd.",
 | ||
|   "kindName": "工具",
 | ||
|   "version": "6.3.2.302",
 | ||
|   "size": 76591487,
 | ||
|   "downCount": "14443706",
 | ||
|   "rateNum": "125000",
 | ||
|   "hot": "4.5",
 | ||
|   "icon": "https://...",
 | ||
|   "briefDes": "应用市场,点亮精彩生活",
 | ||
|   "description": "...",
 | ||
|   "releaseDate": 1234567890000,
 | ||
|   "targetSdk": "12",
 | ||
|   "minsdk": "9",
 | ||
|   ...
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| #### 3.2.2 获取应用评分详情
 | ||
| 
 | ||
| **接口地址:** `POST /harmony/page-detail`
 | ||
| 
 | ||
| **请求体:**
 | ||
| ```json
 | ||
| {
 | ||
|   "pageId": "webAgAppDetail|C1164531384803416384",
 | ||
|   "pageNum": 1,
 | ||
|   "pageSize": 100,
 | ||
|   "zone": ""
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| **响应示例:**
 | ||
| ```json
 | ||
| {
 | ||
|   "pages": [{
 | ||
|     "data": {
 | ||
|       "cardlist": {
 | ||
|         "layoutData": [{
 | ||
|           "type": "fl.card.comment",
 | ||
|           "data": [{
 | ||
|             "starInfo": "{\"averageRating\":\"4.5\",\"oneStarRatingCount\":100,\"twoStarRatingCount\":200,...}"
 | ||
|           }]
 | ||
|         }]
 | ||
|       }
 | ||
|     }
 | ||
|   }]
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| ### 3.3 Token 获取策略
 | ||
| 
 | ||
| Token 需要从华为网页端动态获取,建议实现方式:
 | ||
| 
 | ||
| 1. **方案一:** 使用 Selenium/Playwright 模拟浏览器访问获取
 | ||
| 2. **方案二:** 逆向分析 JS 代码,实现 Token 生成算法
 | ||
| 3. **方案三:** 定期手动更新 Token(不推荐)
 | ||
| 
 | ||
| **参考实现(伪代码):**
 | ||
| ```python
 | ||
| import httpx
 | ||
| from playwright.async_api import async_playwright
 | ||
| 
 | ||
| async def get_huawei_token():
 | ||
|     async with async_playwright() as p:
 | ||
|         browser = await p.chromium.launch()
 | ||
|         page = await browser.new_page()
 | ||
|         
 | ||
|         # 拦截网络请求获取 token
 | ||
|         tokens = {}
 | ||
|         async def handle_request(request):
 | ||
|             if 'interface-code' in request.headers:
 | ||
|                 tokens['interface_code'] = request.headers['interface-code']
 | ||
|                 tokens['identity_id'] = request.headers['identity-id']
 | ||
|         
 | ||
|         page.on('request', handle_request)
 | ||
|         await page.goto('https://appgallery.huawei.com/')
 | ||
|         await page.wait_for_timeout(3000)
 | ||
|         await browser.close()
 | ||
|         
 | ||
|         return tokens
 | ||
| ```
 | ||
| 
 | ||
| ### 3.4 数据字段说明
 | ||
| 
 | ||
| **核心字段:**
 | ||
| - `appId` - 应用唯一标识(长度>15为鸿蒙应用)
 | ||
| - `pkgName` - 包名(唯一)
 | ||
| - `name` - 应用名称
 | ||
| - `developerName` - 开发者名称
 | ||
| - `downCount` - 下载量(字符串格式,如 "1000000+")
 | ||
| - `rateNum` - 评分人数
 | ||
| - `hot` - 热度评分
 | ||
| - `version` - 版本号
 | ||
| - `size` - 应用大小(字节)
 | ||
| - `releaseDate` - 发布时间(毫秒时间戳)
 | ||
| - `targetSdk` / `minsdk` - SDK 版本
 | ||
| 
 | ||
| **注意事项:**
 | ||
| 1. 部分字段可能为空,需要设置默认值
 | ||
| 2. 下载量可能包含 "+" 号,需要清洗
 | ||
| 3. 某些应用(元服务)包名以 `com.atomicservice` 开头,无评分数据
 | ||
| 4. JSON 中可能包含 `\0` 字符,需要清理
 | ||
| 
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 4. 数据库设计
 | ||
| 
 | ||
| ### 4.1 MySQL 表结构
 | ||
| 
 | ||
| #### 4.1.1 应用基本信息表 (app_info)
 | ||
| 
 | ||
| ```sql
 | ||
| CREATE TABLE `app_info` (
 | ||
|   `app_id` VARCHAR(50) PRIMARY KEY COMMENT '应用唯一ID',
 | ||
|   `alliance_app_id` VARCHAR(50) COMMENT '联盟应用ID',
 | ||
|   `name` VARCHAR(255) NOT NULL COMMENT '应用名称',
 | ||
|   `pkg_name` VARCHAR(255) NOT NULL UNIQUE COMMENT '应用包名',
 | ||
|   `dev_id` VARCHAR(50) NOT NULL COMMENT '开发者ID',
 | ||
|   `developer_name` VARCHAR(255) NOT NULL COMMENT '开发者名称',
 | ||
|   `dev_en_name` VARCHAR(255) COMMENT '开发者英文名称',
 | ||
|   `supplier` VARCHAR(255) COMMENT '供应商名称',
 | ||
|   `kind_id` INT NOT NULL COMMENT '应用分类ID',
 | ||
|   `kind_name` VARCHAR(100) NOT NULL COMMENT '应用分类名称',
 | ||
|   `tag_name` VARCHAR(255) COMMENT '标签名称',
 | ||
|   `kind_type_id` INT NOT NULL COMMENT '类型ID',
 | ||
|   `kind_type_name` VARCHAR(100) NOT NULL COMMENT '类型名称',
 | ||
|   `icon_url` TEXT NOT NULL COMMENT '应用图标URL',
 | ||
|   `brief_desc` TEXT NOT NULL COMMENT '简短描述',
 | ||
|   `description` LONGTEXT NOT NULL COMMENT '应用详细描述',
 | ||
|   `privacy_url` TEXT NOT NULL COMMENT '隐私政策链接',
 | ||
|   `ctype` INT NOT NULL COMMENT '客户端类型',
 | ||
|   `detail_id` VARCHAR(100) NOT NULL COMMENT '详情页ID',
 | ||
|   `app_level` INT NOT NULL COMMENT '应用等级',
 | ||
|   `jocat_id` INT NOT NULL COMMENT '分类ID',
 | ||
|   `iap` TINYINT(1) NOT NULL DEFAULT 0 COMMENT '是否含应用内购买',
 | ||
|   `hms` TINYINT(1) NOT NULL DEFAULT 0 COMMENT '是否依赖HMS',
 | ||
|   `tariff_type` VARCHAR(50) NOT NULL COMMENT '资费类型',
 | ||
|   `packing_type` INT NOT NULL COMMENT '打包类型',
 | ||
|   `order_app` TINYINT(1) NOT NULL DEFAULT 0 COMMENT '是否预装应用',
 | ||
|   `denpend_gms` TINYINT(1) NOT NULL DEFAULT 0 COMMENT '是否依赖GMS',
 | ||
|   `denpend_hms` TINYINT(1) NOT NULL DEFAULT 0 COMMENT '是否依赖HMS',
 | ||
|   `force_update` TINYINT(1) NOT NULL DEFAULT 0 COMMENT '是否强制更新',
 | ||
|   `img_tag` VARCHAR(50) NOT NULL COMMENT '图片标签',
 | ||
|   `is_pay` TINYINT(1) NOT NULL DEFAULT 0 COMMENT '是否付费',
 | ||
|   `is_disciplined` TINYINT(1) NOT NULL DEFAULT 0 COMMENT '是否合规',
 | ||
|   `is_shelves` TINYINT(1) NOT NULL DEFAULT 1 COMMENT '是否上架',
 | ||
|   `submit_type` INT NOT NULL DEFAULT 0 COMMENT '提交类型',
 | ||
|   `delete_archive` TINYINT(1) NOT NULL DEFAULT 0 COMMENT '是否删除归档',
 | ||
|   `charging` TINYINT(1) NOT NULL DEFAULT 0 COMMENT '是否收费',
 | ||
|   `button_grey` TINYINT(1) NOT NULL DEFAULT 0 COMMENT '按钮是否置灰',
 | ||
|   `app_gift` TINYINT(1) NOT NULL DEFAULT 0 COMMENT '是否有礼包',
 | ||
|   `free_days` INT NOT NULL DEFAULT 0 COMMENT '免费天数',
 | ||
|   `pay_install_type` INT NOT NULL DEFAULT 0 COMMENT '付费安装类型',
 | ||
|   `comment` JSON COMMENT '评论或注释数据',
 | ||
|   `listed_at` DATETIME NOT NULL COMMENT '应用上架时间',
 | ||
|   `release_countries` JSON COMMENT '应用发布的国家/地区列表',
 | ||
|   `main_device_codes` JSON COMMENT '应用支持的主要设备类型',
 | ||
|   `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
 | ||
|   `updated_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
 | ||
|   INDEX `idx_pkg_name` (`pkg_name`),
 | ||
|   INDEX `idx_developer_name` (`developer_name`),
 | ||
|   INDEX `idx_kind_name` (`kind_name`),
 | ||
|   INDEX `idx_created_at` (`created_at`)
 | ||
| ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='应用基本信息表';
 | ||
| ```
 | ||
| 
 | ||
| #### 4.1.2 应用指标表 (app_metrics)
 | ||
| 
 | ||
| ```sql
 | ||
| CREATE TABLE `app_metrics` (
 | ||
|   `id` BIGINT AUTO_INCREMENT PRIMARY KEY COMMENT '主键ID',
 | ||
|   `app_id` VARCHAR(50) NOT NULL COMMENT '应用ID',
 | ||
|   `pkg_name` VARCHAR(255) NOT NULL COMMENT '应用包名',
 | ||
|   `version` VARCHAR(50) NOT NULL COMMENT '版本号',
 | ||
|   `version_code` BIGINT NOT NULL COMMENT '版本代码',
 | ||
|   `size_bytes` BIGINT NOT NULL COMMENT '应用大小(字节)',
 | ||
|   `sha256` VARCHAR(64) NOT NULL COMMENT '安装包SHA256校验值',
 | ||
|   `info_score` DECIMAL(3,1) NOT NULL COMMENT '信息评分',
 | ||
|   `info_rate_count` BIGINT NOT NULL COMMENT '信息评分人数',
 | ||
|   `download_count` BIGINT NOT NULL COMMENT '下载次数',
 | ||
|   `price` DECIMAL(10,2) NOT NULL DEFAULT 0.00 COMMENT '价格',
 | ||
|   `release_date` BIGINT NOT NULL COMMENT '发布时间(时间戳毫秒)',
 | ||
|   `new_features` TEXT COMMENT '新功能描述',
 | ||
|   `upgrade_msg` TEXT COMMENT '升级信息',
 | ||
|   `target_sdk` VARCHAR(20) NOT NULL COMMENT '目标SDK版本',
 | ||
|   `min_sdk` VARCHAR(20) NOT NULL COMMENT '最小SDK版本',
 | ||
|   `compile_sdk_version` INT DEFAULT 0 COMMENT '编译SDK版本',
 | ||
|   `min_hmos_api_level` INT DEFAULT 0 COMMENT '最小HarmonyOS API等级',
 | ||
|   `api_release_type` VARCHAR(50) DEFAULT 'Release' COMMENT 'API发布类型',
 | ||
|   `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
 | ||
|   FOREIGN KEY (`app_id`) REFERENCES `app_info`(`app_id`) ON DELETE CASCADE,
 | ||
|   FOREIGN KEY (`pkg_name`) REFERENCES `app_info`(`pkg_name`) ON DELETE CASCADE,
 | ||
|   INDEX `idx_app_id` (`app_id`),
 | ||
|   INDEX `idx_pkg_name` (`pkg_name`),
 | ||
|   INDEX `idx_download_count` (`download_count`),
 | ||
|   INDEX `idx_created_at` (`created_at`)
 | ||
| ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='应用指标表';
 | ||
| ```
 | ||
| 
 | ||
| #### 4.1.3 应用评分表 (app_rating)
 | ||
| 
 | ||
| ```sql
 | ||
| CREATE TABLE `app_rating` (
 | ||
|   `id` BIGINT AUTO_INCREMENT PRIMARY KEY COMMENT '主键ID',
 | ||
|   `app_id` VARCHAR(50) NOT NULL COMMENT '应用ID',
 | ||
|   `pkg_name` VARCHAR(255) NOT NULL COMMENT '应用包名',
 | ||
|   `average_rating` DECIMAL(3,2) NOT NULL COMMENT '平均评分',
 | ||
|   `star_1_count` INT NOT NULL DEFAULT 0 COMMENT '1星评分数量',
 | ||
|   `star_2_count` INT NOT NULL DEFAULT 0 COMMENT '2星评分数量',
 | ||
|   `star_3_count` INT NOT NULL DEFAULT 0 COMMENT '3星评分数量',
 | ||
|   `star_4_count` INT NOT NULL DEFAULT 0 COMMENT '4星评分数量',
 | ||
|   `star_5_count` INT NOT NULL DEFAULT 0 COMMENT '5星评分数量',
 | ||
|   `total_rating_count` INT NOT NULL DEFAULT 0 COMMENT '总评分数量',
 | ||
|   `only_star_count` INT NOT NULL DEFAULT 0 COMMENT '仅星级数量',
 | ||
|   `full_average_rating` VARCHAR(20) COMMENT '完整平均评分',
 | ||
|   `source_type` VARCHAR(50) COMMENT '来源类型',
 | ||
|   `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
 | ||
|   FOREIGN KEY (`app_id`) REFERENCES `app_info`(`app_id`) ON DELETE CASCADE,
 | ||
|   FOREIGN KEY (`pkg_name`) REFERENCES `app_info`(`pkg_name`) ON DELETE CASCADE,
 | ||
|   INDEX `idx_app_id` (`app_id`),
 | ||
|   INDEX `idx_pkg_name` (`pkg_name`),
 | ||
|   INDEX `idx_average_rating` (`average_rating`),
 | ||
|   INDEX `idx_created_at` (`created_at`)
 | ||
| ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='应用评分表';
 | ||
| ```
 | ||
| 
 | ||
| #### 4.1.4 原始数据历史表 (app_data_history)
 | ||
| 
 | ||
| ```sql
 | ||
| CREATE TABLE `app_data_history` (
 | ||
|   `id` BIGINT AUTO_INCREMENT PRIMARY KEY COMMENT '主键ID',
 | ||
|   `app_id` VARCHAR(50) NOT NULL COMMENT '应用ID',
 | ||
|   `pkg_name` VARCHAR(255) NOT NULL COMMENT '应用包名',
 | ||
|   `raw_json_data` JSON NOT NULL COMMENT '原始应用数据JSON',
 | ||
|   `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
 | ||
|   FOREIGN KEY (`app_id`) REFERENCES `app_info`(`app_id`) ON DELETE CASCADE,
 | ||
|   FOREIGN KEY (`pkg_name`) REFERENCES `app_info`(`pkg_name`) ON DELETE CASCADE,
 | ||
|   INDEX `idx_app_id` (`app_id`),
 | ||
|   INDEX `idx_created_at` (`created_at`)
 | ||
| ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='原始数据历史表';
 | ||
| ```
 | ||
| 
 | ||
| #### 4.1.5 评分历史表 (app_rating_history)
 | ||
| 
 | ||
| ```sql
 | ||
| CREATE TABLE `app_rating_history` (
 | ||
|   `id` BIGINT AUTO_INCREMENT PRIMARY KEY COMMENT '主键ID',
 | ||
|   `app_id` VARCHAR(50) NOT NULL COMMENT '应用ID',
 | ||
|   `pkg_name` VARCHAR(255) NOT NULL COMMENT '应用包名',
 | ||
|   `raw_json_rating` JSON NOT NULL COMMENT '原始评分数据JSON',
 | ||
|   `created_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
 | ||
|   FOREIGN KEY (`app_id`) REFERENCES `app_info`(`app_id`) ON DELETE CASCADE,
 | ||
|   FOREIGN KEY (`pkg_name`) REFERENCES `app_info`(`pkg_name`) ON DELETE CASCADE,
 | ||
|   INDEX `idx_app_id` (`app_id`),
 | ||
|   INDEX `idx_created_at` (`created_at`)
 | ||
| ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='评分历史表';
 | ||
| ```
 | ||
| 
 | ||
| ### 4.2 索引优化建议
 | ||
| 
 | ||
| 1. **联合索引:**
 | ||
|    - `(pkg_name, created_at)` - 用于按包名查询历史
 | ||
|    - `(developer_name, download_count)` - 用于开发者排行
 | ||
|    - `(kind_name, download_count)` - 用于分类排行
 | ||
| 
 | ||
| 2. **全文索引:**
 | ||
|    - `name`, `brief_desc` - 用于应用搜索
 | ||
| 
 | ||
| 3. **分区策略:**
 | ||
|    - 历史表按月分区,提高查询效率
 | ||
| 
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 5. 后端开发
 | ||
| 
 | ||
| ### 5.1 项目结构
 | ||
| 
 | ||
| ```
 | ||
| backend/
 | ||
| ├── app/
 | ||
| │   ├── __init__.py
 | ||
| │   ├── main.py                 # FastAPI 应用入口
 | ||
| │   ├── config.py               # 配置文件
 | ||
| │   ├── database.py             # 数据库连接
 | ||
| │   ├── models/                 # SQLAlchemy 模型
 | ||
| │   │   ├── __init__.py
 | ||
| │   │   ├── app_info.py
 | ||
| │   │   ├── app_metrics.py
 | ||
| │   │   └── app_rating.py
 | ||
| │   ├── schemas/                # Pydantic 模型
 | ||
| │   │   ├── __init__.py
 | ||
| │   │   ├── app.py
 | ||
| │   │   └── response.py
 | ||
| │   ├── api/                    # API 路由
 | ||
| │   │   ├── __init__.py
 | ||
| │   │   ├── apps.py
 | ||
| │   │   ├── rankings.py
 | ||
| │   │   ├── charts.py
 | ||
| │   │   └── submit.py
 | ||
| │   ├── crawler/                # 爬虫模块
 | ||
| │   │   ├── __init__.py
 | ||
| │   │   ├── huawei_api.py      # 华为API封装
 | ||
| │   │   ├── token_manager.py   # Token管理
 | ||
| │   │   └── data_processor.py  # 数据处理
 | ||
| │   ├── scheduler/              # 调度模块
 | ||
| │   │   ├── __init__.py
 | ||
| │   │   └── tasks.py
 | ||
| │   └── utils/                  # 工具函数
 | ||
| │       ├── __init__.py
 | ||
| │       └── helpers.py
 | ||
| ├── requirements.txt
 | ||
| ├── .env.example
 | ||
| └── README.md
 | ||
| ```
 | ||
| 
 | ||
| ### 5.2 核心代码实现
 | ||
| 
 | ||
| #### 5.2.1 配置文件 (config.py)
 | ||
| 
 | ||
| ```python
 | ||
| from pydantic_settings import BaseSettings
 | ||
| from typing import List
 | ||
| 
 | ||
| class Settings(BaseSettings):
 | ||
|     # 数据库配置
 | ||
|     MYSQL_HOST: str = "localhost"
 | ||
|     MYSQL_PORT: int = 3306
 | ||
|     MYSQL_USER: str = "root"
 | ||
|     MYSQL_PASSWORD: str = "password"
 | ||
|     MYSQL_DATABASE: str = "huawei_market"
 | ||
|     
 | ||
|     # 华为API配置
 | ||
|     HUAWEI_API_BASE_URL: str = "https://web-drcn.hispace.dbankcloud.com/edge"
 | ||
|     HUAWEI_LOCALE: str = "zh_CN"
 | ||
|     
 | ||
|     # 爬虫配置
 | ||
|     CRAWLER_INTERVAL: int = 1800  # 同步间隔(秒)
 | ||
|     CRAWLER_BATCH_SIZE: int = 100  # 批量处理大小
 | ||
|     CRAWLER_TIMEOUT: int = 30      # 请求超时(秒)
 | ||
|     
 | ||
|     # API配置
 | ||
|     API_PREFIX: str = "/api"
 | ||
|     API_TITLE: str = "华为应用市场数据API"
 | ||
|     API_VERSION: str = "1.0.0"
 | ||
|     
 | ||
|     # 其他配置
 | ||
|     DEBUG: bool = False
 | ||
|     CORS_ORIGINS: List[str] = ["http://localhost:5173", "http://localhost:3000"]
 | ||
|     
 | ||
|     @property
 | ||
|     def database_url(self) -> str:
 | ||
|         return f"mysql+aiomysql://{self.MYSQL_USER}:{self.MYSQL_PASSWORD}@{self.MYSQL_HOST}:{self.MYSQL_PORT}/{self.MYSQL_DATABASE}"
 | ||
|     
 | ||
|     class Config:
 | ||
|         env_file = ".env"
 | ||
| 
 | ||
| settings = Settings()
 | ||
| ```
 | ||
| 
 | ||
| #### 5.2.2 数据库连接 (database.py)
 | ||
| 
 | ||
| ```python
 | ||
| from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
 | ||
| from sqlalchemy.ext.declarative import declarative_base
 | ||
| from sqlalchemy.orm import sessionmaker
 | ||
| from app.config import settings
 | ||
| 
 | ||
| # 创建异步引擎
 | ||
| engine = create_async_engine(
 | ||
|     settings.database_url,
 | ||
|     echo=settings.DEBUG,
 | ||
|     pool_size=10,
 | ||
|     max_overflow=20,
 | ||
|     pool_pre_ping=True
 | ||
| )
 | ||
| 
 | ||
| # 创建异步会话工厂
 | ||
| AsyncSessionLocal = sessionmaker(
 | ||
|     engine,
 | ||
|     class_=AsyncSession,
 | ||
|     expire_on_commit=False
 | ||
| )
 | ||
| 
 | ||
| # 创建基类
 | ||
| Base = declarative_base()
 | ||
| 
 | ||
| # 依赖注入
 | ||
| async def get_db():
 | ||
|     async with AsyncSessionLocal() as session:
 | ||
|         try:
 | ||
|             yield session
 | ||
|         finally:
 | ||
|             await session.close()
 | ||
| ```
 | ||
| 
 | ||
| #### 5.2.3 数据模型 (models/app_info.py)
 | ||
| 
 | ||
| ```python
 | ||
| from sqlalchemy import Column, String, Integer, Text, DateTime, Boolean, JSON, DECIMAL, BigInteger
 | ||
| from sqlalchemy.sql import func
 | ||
| from app.database import Base
 | ||
| 
 | ||
| class AppInfo(Base):
 | ||
|     __tablename__ = "app_info"
 | ||
|     
 | ||
|     app_id = Column(String(50), primary_key=True, comment="应用唯一ID")
 | ||
|     alliance_app_id = Column(String(50), comment="联盟应用ID")
 | ||
|     name = Column(String(255), nullable=False, comment="应用名称")
 | ||
|     pkg_name = Column(String(255), nullable=False, unique=True, index=True, comment="应用包名")
 | ||
|     dev_id = Column(String(50), nullable=False, comment="开发者ID")
 | ||
|     developer_name = Column(String(255), nullable=False, index=True, comment="开发者名称")
 | ||
|     dev_en_name = Column(String(255), comment="开发者英文名称")
 | ||
|     supplier = Column(String(255), comment="供应商名称")
 | ||
|     kind_id = Column(Integer, nullable=False, comment="应用分类ID")
 | ||
|     kind_name = Column(String(100), nullable=False, index=True, comment="应用分类名称")
 | ||
|     tag_name = Column(String(255), comment="标签名称")
 | ||
|     kind_type_id = Column(Integer, nullable=False, comment="类型ID")
 | ||
|     kind_type_name = Column(String(100), nullable=False, comment="类型名称")
 | ||
|     icon_url = Column(Text, nullable=False, comment="应用图标URL")
 | ||
|     brief_desc = Column(Text, nullable=False, comment="简短描述")
 | ||
|     description = Column(Text, nullable=False, comment="应用详细描述")
 | ||
|     privacy_url = Column(Text, nullable=False, comment="隐私政策链接")
 | ||
|     
 | ||
|     # 布尔字段
 | ||
|     iap = Column(Boolean, default=False, comment="是否含应用内购买")
 | ||
|     hms = Column(Boolean, default=False, comment="是否依赖HMS")
 | ||
|     is_pay = Column(Boolean, default=False, comment="是否付费")
 | ||
|     is_shelves = Column(Boolean, default=True, comment="是否上架")
 | ||
|     
 | ||
|     # JSON字段
 | ||
|     comment = Column(JSON, comment="评论或注释数据")
 | ||
|     release_countries = Column(JSON, comment="应用发布的国家/地区列表")
 | ||
|     main_device_codes = Column(JSON, comment="应用支持的主要设备类型")
 | ||
|     
 | ||
|     # 时间字段
 | ||
|     listed_at = Column(DateTime, nullable=False, comment="应用上架时间")
 | ||
|     created_at = Column(DateTime, nullable=False, server_default=func.now(), comment="创建时间")
 | ||
|     updated_at = Column(DateTime, nullable=False, server_default=func.now(), onupdate=func.now(), comment="更新时间")
 | ||
| ```
 | ||
| 
 | ||
| #### 5.2.4 华为API封装 (crawler/huawei_api.py)
 | ||
| 
 | ||
| ```python
 | ||
| import httpx
 | ||
| import asyncio
 | ||
| import json
 | ||
| from typing import Optional, Dict, Any
 | ||
| from app.config import settings
 | ||
| from app.crawler.token_manager import TokenManager
 | ||
| 
 | ||
| class HuaweiAPI:
 | ||
|     def __init__(self):
 | ||
|         self.base_url = settings.HUAWEI_API_BASE_URL
 | ||
|         self.locale = settings.HUAWEI_LOCALE
 | ||
|         self.token_manager = TokenManager()
 | ||
|         self.client = httpx.AsyncClient(timeout=settings.CRAWLER_TIMEOUT)
 | ||
|     
 | ||
|     async def get_app_info(self, pkg_name: Optional[str] = None, app_id: Optional[str] = None) -> Dict[str, Any]:
 | ||
|         """获取应用基本信息"""
 | ||
|         if not pkg_name and not app_id:
 | ||
|             raise ValueError("必须提供 pkg_name 或 app_id")
 | ||
|         
 | ||
|         # 获取token
 | ||
|         tokens = await self.token_manager.get_token()
 | ||
|         
 | ||
|         # 构建请求
 | ||
|         url = f"{self.base_url}/webedge/appinfo"
 | ||
|         headers = {
 | ||
|             "Content-Type": "application/json",
 | ||
|             "User-Agent": "HuaweiMarketCrawler/1.0",
 | ||
|             "interface-code": tokens["interface_code"],
 | ||
|             "identity-id": tokens["identity_id"]
 | ||
|         }
 | ||
|         
 | ||
|         body = {"locale": self.locale}
 | ||
|         if pkg_name:
 | ||
|             body["pkgName"] = pkg_name
 | ||
|         else:
 | ||
|             body["appId"] = app_id
 | ||
|         
 | ||
|         # 发送请求
 | ||
|         response = await self.client.post(url, headers=headers, json=body)
 | ||
|         response.raise_for_status()
 | ||
|         
 | ||
|         data = response.json()
 | ||
|         
 | ||
|         # 数据清洗
 | ||
|         return self._clean_data(data)
 | ||
|     
 | ||
|     async def get_app_rating(self, app_id: str) -> Optional[Dict[str, Any]]:
 | ||
|         """获取应用评分详情"""
 | ||
|         # 跳过元服务
 | ||
|         if app_id.startswith("com.atomicservice"):
 | ||
|             return None
 | ||
|         
 | ||
|         tokens = await self.token_manager.get_token()
 | ||
|         
 | ||
|         url = f"{self.base_url}/harmony/page-detail"
 | ||
|         headers = {
 | ||
|             "Content-Type": "application/json",
 | ||
|             "User-Agent": "HuaweiMarketCrawler/1.0",
 | ||
|             "Interface-Code": tokens["interface_code"],
 | ||
|             "identity-id": tokens["identity_id"]
 | ||
|         }
 | ||
|         
 | ||
|         body = {
 | ||
|             "pageId": f"webAgAppDetail|{app_id}",
 | ||
|             "pageNum": 1,
 | ||
|             "pageSize": 100,
 | ||
|             "zone": ""
 | ||
|         }
 | ||
|         
 | ||
|         try:
 | ||
|             response = await self.client.post(url, headers=headers, json=body)
 | ||
|             response.raise_for_status()
 | ||
|             data = response.json()
 | ||
|             
 | ||
|             # 解析评分数据
 | ||
|             layouts = data["pages"][0]["data"]["cardlist"]["layoutData"]
 | ||
|             comment_cards = [l for l in layouts if l.get("type") == "fl.card.comment"]
 | ||
|             
 | ||
|             if not comment_cards:
 | ||
|                 return None
 | ||
|             
 | ||
|             star_info_str = comment_cards[0]["data"][0]["starInfo"]
 | ||
|             return json.loads(star_info_str)
 | ||
|         
 | ||
|         except Exception as e:
 | ||
|             print(f"获取评分失败: {e}")
 | ||
|             return None
 | ||
|     
 | ||
|     def _clean_data(self, data: Dict[str, Any]) -> Dict[str, Any]:
 | ||
|         """清洗数据"""
 | ||
|         # 移除 \0 字符
 | ||
|         for key, value in data.items():
 | ||
|             if isinstance(value, str):
 | ||
|                 data[key] = value.replace('\x00', '')
 | ||
|         
 | ||
|         # 移除 AG-TraceId
 | ||
|         data.pop('AG-TraceId', None)
 | ||
|         
 | ||
|         # 验证 appId 长度
 | ||
|         if len(data.get('appId', '')) < 15:
 | ||
|             raise ValueError("appId长度小于15,可能是安卓应用")
 | ||
|         
 | ||
|         return data
 | ||
|     
 | ||
|     async def close(self):
 | ||
|         """关闭客户端"""
 | ||
|         await self.client.aclose()
 | ||
| ```
 | ||
| 
 | ||
| 
 | ||
| #### 5.2.5 Token管理器 (crawler/token_manager.py)
 | ||
| 
 | ||
| ```python
 | ||
| import asyncio
 | ||
| from datetime import datetime, timedelta
 | ||
| from typing import Dict
 | ||
| from playwright.async_api import async_playwright
 | ||
| 
 | ||
| class TokenManager:
 | ||
|     def __init__(self):
 | ||
|         self.tokens: Dict[str, str] = {}
 | ||
|         self.token_expires_at: datetime = datetime.now()
 | ||
|         self.lock = asyncio.Lock()
 | ||
|     
 | ||
|     async def get_token(self) -> Dict[str, str]:
 | ||
|         """获取有效的token"""
 | ||
|         async with self.lock:
 | ||
|             if datetime.now() >= self.token_expires_at or not self.tokens:
 | ||
|                 await self._refresh_token()
 | ||
|             return self.tokens
 | ||
|     
 | ||
|     async def _refresh_token(self):
 | ||
|         """刷新token"""
 | ||
|         print("正在刷新token...")
 | ||
|         
 | ||
|         async with async_playwright() as p:
 | ||
|             browser = await p.chromium.launch(headless=True)
 | ||
|             page = await browser.new_page()
 | ||
|             
 | ||
|             # 拦截请求获取token
 | ||
|             tokens = {}
 | ||
|             
 | ||
|             async def handle_request(request):
 | ||
|                 headers = request.headers
 | ||
|                 if 'interface-code' in headers:
 | ||
|                     tokens['interface_code'] = headers['interface-code']
 | ||
|                     tokens['identity_id'] = headers['identity-id']
 | ||
|             
 | ||
|             page.on('request', handle_request)
 | ||
|             
 | ||
|             # 访问华为应用市场
 | ||
|             await page.goto('https://appgallery.huawei.com/', wait_until='networkidle')
 | ||
|             await page.wait_for_timeout(3000)
 | ||
|             
 | ||
|             await browser.close()
 | ||
|             
 | ||
|             if tokens:
 | ||
|                 self.tokens = tokens
 | ||
|                 # token有效期设为10分钟
 | ||
|                 self.token_expires_at = datetime.now() + timedelta(minutes=10)
 | ||
|                 print(f"Token刷新成功,有效期至: {self.token_expires_at}")
 | ||
|             else:
 | ||
|                 raise Exception("无法获取token")
 | ||
| ```
 | ||
| 
 | ||
| #### 5.2.6 数据处理器 (crawler/data_processor.py)
 | ||
| 
 | ||
| ```python
 | ||
| from typing import Dict, Any, Optional, Tuple
 | ||
| from datetime import datetime
 | ||
| from sqlalchemy.ext.asyncio import AsyncSession
 | ||
| from sqlalchemy import select
 | ||
| from app.models.app_info import AppInfo
 | ||
| from app.models.app_metrics import AppMetrics
 | ||
| from app.models.app_rating import AppRating
 | ||
| from app.models.app_data_history import AppDataHistory
 | ||
| from app.models.app_rating_history import AppRatingHistory
 | ||
| 
 | ||
| class DataProcessor:
 | ||
|     def __init__(self, db: AsyncSession):
 | ||
|         self.db = db
 | ||
|     
 | ||
|     async def save_app_data(
 | ||
|         self,
 | ||
|         app_data: Dict[str, Any],
 | ||
|         rating_data: Optional[Dict[str, Any]] = None,
 | ||
|         comment: Optional[Dict[str, Any]] = None
 | ||
|     ) -> Tuple[bool, bool, bool]:
 | ||
|         """
 | ||
|         保存应用数据
 | ||
|         返回: (是否插入新应用信息, 是否插入新指标, 是否插入新评分)
 | ||
|         """
 | ||
|         app_id = app_data['appId']
 | ||
|         pkg_name = app_data['pkgName']
 | ||
|         
 | ||
|         # 检查应用是否存在
 | ||
|         result = await self.db.execute(
 | ||
|             select(AppInfo).where(AppInfo.app_id == app_id)
 | ||
|         )
 | ||
|         existing_app = result.scalar_one_or_none()
 | ||
|         
 | ||
|         # 保存应用基本信息
 | ||
|         info_inserted = False
 | ||
|         if not existing_app or await self._is_info_changed(existing_app, app_data):
 | ||
|             await self._save_app_info(app_data, comment)
 | ||
|             info_inserted = True
 | ||
|         
 | ||
|         # 保存应用指标
 | ||
|         metric_inserted = False
 | ||
|         if await self._should_save_metric(app_id, app_data):
 | ||
|             await self._save_app_metric(app_data)
 | ||
|             metric_inserted = True
 | ||
|         
 | ||
|         # 保存评分数据
 | ||
|         rating_inserted = False
 | ||
|         if rating_data and await self._should_save_rating(app_id, rating_data):
 | ||
|             await self._save_app_rating(app_id, pkg_name, rating_data)
 | ||
|             rating_inserted = True
 | ||
|         
 | ||
|         # 保存原始数据历史
 | ||
|         if info_inserted or metric_inserted:
 | ||
|             await self._save_data_history(app_id, pkg_name, app_data)
 | ||
|         
 | ||
|         if rating_inserted:
 | ||
|             await self._save_rating_history(app_id, pkg_name, rating_data)
 | ||
|         
 | ||
|         await self.db.commit()
 | ||
|         
 | ||
|         return info_inserted, metric_inserted, rating_inserted
 | ||
|     
 | ||
|     async def _save_app_info(self, data: Dict[str, Any], comment: Optional[Dict] = None):
 | ||
|         """保存应用基本信息"""
 | ||
|         app_info = AppInfo(
 | ||
|             app_id=data['appId'],
 | ||
|             alliance_app_id=data.get('allianceAppId', ''),
 | ||
|             name=data['name'],
 | ||
|             pkg_name=data['pkgName'],
 | ||
|             dev_id=data['devId'],
 | ||
|             developer_name=data['developerName'],
 | ||
|             dev_en_name=data.get('devEnName', ''),
 | ||
|             supplier=data.get('supplier', ''),
 | ||
|             kind_id=int(data['kindId']),
 | ||
|             kind_name=data['kindName'],
 | ||
|             tag_name=data.get('tagName'),
 | ||
|             kind_type_id=int(data['kindTypeId']),
 | ||
|             kind_type_name=data['kindTypeName'],
 | ||
|             icon_url=data['icon'],
 | ||
|             brief_desc=data['briefDes'],
 | ||
|             description=data['description'],
 | ||
|             privacy_url=data['privacyUrl'],
 | ||
|             iap=bool(data.get('iap', 0)),
 | ||
|             hms=bool(data.get('hms', 0)),
 | ||
|             is_pay=data.get('isPay') == '1',
 | ||
|             is_shelves=bool(data.get('isShelves', 1)),
 | ||
|             comment=comment,
 | ||
|             release_countries=data.get('releaseCountries', []),
 | ||
|             main_device_codes=data.get('mainDeviceCodes', []),
 | ||
|             listed_at=datetime.fromtimestamp(data.get('releaseDate', 0) / 1000)
 | ||
|         )
 | ||
|         
 | ||
|         # 使用 merge 实现 upsert
 | ||
|         self.db.add(app_info)
 | ||
|     
 | ||
|     async def _save_app_metric(self, data: Dict[str, Any]):
 | ||
|         """保存应用指标"""
 | ||
|         # 清洗下载量数据
 | ||
|         download_count = self._parse_download_count(data.get('downCount', '0'))
 | ||
|         
 | ||
|         metric = AppMetrics(
 | ||
|             app_id=data['appId'],
 | ||
|             pkg_name=data['pkgName'],
 | ||
|             version=data['version'],
 | ||
|             version_code=int(data['versionCode']),
 | ||
|             size_bytes=int(data['size']),
 | ||
|             sha256=data.get('sha256', ''),
 | ||
|             info_score=float(data.get('hot', '0.0')),
 | ||
|             info_rate_count=int(data.get('rateNum', '0')),
 | ||
|             download_count=download_count,
 | ||
|             price=float(data.get('price', '0')),
 | ||
|             release_date=int(data.get('releaseDate', 0)),
 | ||
|             new_features=data.get('newFeatures', ''),
 | ||
|             upgrade_msg=data.get('upgradeMsg', ''),
 | ||
|             target_sdk=data.get('targetSdk', ''),
 | ||
|             min_sdk=data.get('minsdk', ''),
 | ||
|             compile_sdk_version=int(data.get('compileSdkVersion', 0)),
 | ||
|             min_hmos_api_level=int(data.get('minHmosApiLevel', 0)),
 | ||
|             api_release_type=data.get('apiReleaseType', 'Release')
 | ||
|         )
 | ||
|         
 | ||
|         self.db.add(metric)
 | ||
|     
 | ||
|     async def _save_app_rating(self, app_id: str, pkg_name: str, data: Dict[str, Any]):
 | ||
|         """保存应用评分"""
 | ||
|         rating = AppRating(
 | ||
|             app_id=app_id,
 | ||
|             pkg_name=pkg_name,
 | ||
|             average_rating=float(data['averageRating']),
 | ||
|             star_1_count=int(data['oneStarRatingCount']),
 | ||
|             star_2_count=int(data['twoStarRatingCount']),
 | ||
|             star_3_count=int(data['threeStarRatingCount']),
 | ||
|             star_4_count=int(data['fourStarRatingCount']),
 | ||
|             star_5_count=int(data['fiveStarRatingCount']),
 | ||
|             total_rating_count=int(data['totalStarRatingCount']),
 | ||
|             only_star_count=int(data.get('onlyStarCount', 0)),
 | ||
|             full_average_rating=data.get('fullAverageRating', ''),
 | ||
|             source_type=data.get('sourceType', '')
 | ||
|         )
 | ||
|         
 | ||
|         self.db.add(rating)
 | ||
|     
 | ||
|     def _parse_download_count(self, count_str: str) -> int:
 | ||
|         """解析下载量字符串"""
 | ||
|         # 移除 + 号和其他非数字字符
 | ||
|         count_str = count_str.replace('+', '').replace(',', '')
 | ||
|         try:
 | ||
|             return int(count_str)
 | ||
|         except ValueError:
 | ||
|             return 0
 | ||
|     
 | ||
|     async def _is_info_changed(self, existing: AppInfo, new_data: Dict) -> bool:
 | ||
|         """检查应用信息是否变化"""
 | ||
|         return (
 | ||
|             existing.name != new_data['name'] or
 | ||
|             existing.version != new_data.get('version', '') or
 | ||
|             existing.description != new_data.get('description', '')
 | ||
|         )
 | ||
|     
 | ||
|     async def _should_save_metric(self, app_id: str, data: Dict) -> bool:
 | ||
|         """判断是否需要保存新的指标数据"""
 | ||
|         # 查询最新的指标
 | ||
|         result = await self.db.execute(
 | ||
|             select(AppMetrics)
 | ||
|             .where(AppMetrics.app_id == app_id)
 | ||
|             .order_by(AppMetrics.created_at.desc())
 | ||
|             .limit(1)
 | ||
|         )
 | ||
|         latest_metric = result.scalar_one_or_none()
 | ||
|         
 | ||
|         if not latest_metric:
 | ||
|             return True
 | ||
|         
 | ||
|         # 比较关键字段
 | ||
|         return (
 | ||
|             latest_metric.version != data['version'] or
 | ||
|             latest_metric.download_count != self._parse_download_count(data.get('downCount', '0'))
 | ||
|         )
 | ||
|     
 | ||
|     async def _should_save_rating(self, app_id: str, data: Dict) -> bool:
 | ||
|         """判断是否需要保存新的评分数据"""
 | ||
|         result = await self.db.execute(
 | ||
|             select(AppRating)
 | ||
|             .where(AppRating.app_id == app_id)
 | ||
|             .order_by(AppRating.created_at.desc())
 | ||
|             .limit(1)
 | ||
|         )
 | ||
|         latest_rating = result.scalar_one_or_none()
 | ||
|         
 | ||
|         if not latest_rating:
 | ||
|             return True
 | ||
|         
 | ||
|         return (
 | ||
|             float(latest_rating.average_rating) != float(data['averageRating']) or
 | ||
|             latest_rating.total_rating_count != int(data['totalStarRatingCount'])
 | ||
|         )
 | ||
| ```
 | ||
| 
 | ||
| 
 | ||
| #### 5.2.7 API路由 (api/apps.py)
 | ||
| 
 | ||
| ```python
 | ||
| from fastapi import APIRouter, Depends, HTTPException, Query
 | ||
| from sqlalchemy.ext.asyncio import AsyncSession
 | ||
| from sqlalchemy import select, func, or_
 | ||
| from typing import Optional, List
 | ||
| from app.database import get_db
 | ||
| from app.models.app_info import AppInfo
 | ||
| from app.models.app_metrics import AppMetrics
 | ||
| from app.models.app_rating import AppRating
 | ||
| from app.schemas.response import ApiResponse
 | ||
| from app.crawler.huawei_api import HuaweiAPI
 | ||
| from app.crawler.data_processor import DataProcessor
 | ||
| 
 | ||
| router = APIRouter(prefix="/apps", tags=["应用"])
 | ||
| 
 | ||
| @router.get("/pkg_name/{pkg_name}")
 | ||
| async def get_app_by_pkg_name(
 | ||
|     pkg_name: str,
 | ||
|     db: AsyncSession = Depends(get_db)
 | ||
| ):
 | ||
|     """按包名查询应用"""
 | ||
|     # 尝试从API获取最新数据
 | ||
|     api = HuaweiAPI()
 | ||
|     try:
 | ||
|         app_data = await api.get_app_info(pkg_name=pkg_name)
 | ||
|         rating_data = await api.get_app_rating(app_data['appId'])
 | ||
|         
 | ||
|         # 保存到数据库
 | ||
|         processor = DataProcessor(db)
 | ||
|         new_info, new_metric, new_rating = await processor.save_app_data(
 | ||
|             app_data, rating_data
 | ||
|         )
 | ||
|         
 | ||
|         # 查询完整数据
 | ||
|         result = await db.execute(
 | ||
|             select(AppInfo, AppMetrics, AppRating)
 | ||
|             .join(AppMetrics, AppInfo.app_id == AppMetrics.app_id)
 | ||
|             .outerjoin(AppRating, AppInfo.app_id == AppRating.app_id)
 | ||
|             .where(AppInfo.pkg_name == pkg_name)
 | ||
|             .order_by(AppMetrics.created_at.desc())
 | ||
|             .limit(1)
 | ||
|         )
 | ||
|         row = result.first()
 | ||
|         
 | ||
|         return ApiResponse(
 | ||
|             success=True,
 | ||
|             data={
 | ||
|                 "info": row[0].__dict__ if row else None,
 | ||
|                 "metric": row[1].__dict__ if row and len(row) > 1 else None,
 | ||
|                 "rating": row[2].__dict__ if row and len(row) > 2 else None,
 | ||
|                 "new_info": new_info,
 | ||
|                 "new_metric": new_metric,
 | ||
|                 "new_rating": new_rating,
 | ||
|                 "get_data": True
 | ||
|             }
 | ||
|         )
 | ||
|     
 | ||
|     except Exception as e:
 | ||
|         # 回退到数据库数据
 | ||
|         result = await db.execute(
 | ||
|             select(AppInfo, AppMetrics, AppRating)
 | ||
|             .join(AppMetrics, AppInfo.app_id == AppMetrics.app_id)
 | ||
|             .outerjoin(AppRating, AppInfo.app_id == AppRating.app_id)
 | ||
|             .where(AppInfo.pkg_name == pkg_name)
 | ||
|             .order_by(AppMetrics.created_at.desc())
 | ||
|             .limit(1)
 | ||
|         )
 | ||
|         row = result.first()
 | ||
|         
 | ||
|         if not row:
 | ||
|             raise HTTPException(status_code=404, detail=f"应用 {pkg_name} 不存在")
 | ||
|         
 | ||
|         return ApiResponse(
 | ||
|             success=True,
 | ||
|             data={
 | ||
|                 "info": row[0].__dict__,
 | ||
|                 "metric": row[1].__dict__ if len(row) > 1 else None,
 | ||
|                 "rating": row[2].__dict__ if len(row) > 2 else None,
 | ||
|                 "get_data": False,
 | ||
|                 "error": str(e)
 | ||
|             }
 | ||
|         )
 | ||
|     finally:
 | ||
|         await api.close()
 | ||
| 
 | ||
| @router.get("/list/{page}")
 | ||
| async def get_app_list(
 | ||
|     page: int = 1,
 | ||
|     page_size: int = Query(100, le=500),
 | ||
|     detail: bool = True,
 | ||
|     sort: Optional[str] = None,
 | ||
|     desc: bool = True,
 | ||
|     search_key: Optional[str] = None,
 | ||
|     search_value: Optional[str] = None,
 | ||
|     search_exact: bool = False,
 | ||
|     db: AsyncSession = Depends(get_db)
 | ||
| ):
 | ||
|     """分页获取应用列表"""
 | ||
|     # 构建基础查询
 | ||
|     if detail:
 | ||
|         query = select(AppInfo, AppMetrics, AppRating).join(
 | ||
|             AppMetrics, AppInfo.app_id == AppMetrics.app_id
 | ||
|         ).outerjoin(
 | ||
|             AppRating, AppInfo.app_id == AppRating.app_id
 | ||
|         )
 | ||
|     else:
 | ||
|         query = select(AppInfo)
 | ||
|     
 | ||
|     # 搜索过滤
 | ||
|     if search_key and search_value:
 | ||
|         if search_exact:
 | ||
|             query = query.where(getattr(AppInfo, search_key) == search_value)
 | ||
|         else:
 | ||
|             query = query.where(getattr(AppInfo, search_key).like(f"%{search_value}%"))
 | ||
|     
 | ||
|     # 排序
 | ||
|     if sort:
 | ||
|         order_column = getattr(AppMetrics if hasattr(AppMetrics, sort) else AppInfo, sort)
 | ||
|         query = query.order_by(order_column.desc() if desc else order_column.asc())
 | ||
|     else:
 | ||
|         query = query.order_by(AppMetrics.download_count.desc())
 | ||
|     
 | ||
|     # 计算总数
 | ||
|     count_query = select(func.count()).select_from(AppInfo)
 | ||
|     if search_key and search_value:
 | ||
|         if search_exact:
 | ||
|             count_query = count_query.where(getattr(AppInfo, search_key) == search_value)
 | ||
|         else:
 | ||
|             count_query = count_query.where(getattr(AppInfo, search_key).like(f"%{search_value}%"))
 | ||
|     
 | ||
|     total_result = await db.execute(count_query)
 | ||
|     total_count = total_result.scalar()
 | ||
|     
 | ||
|     # 分页
 | ||
|     offset = (page - 1) * page_size
 | ||
|     query = query.offset(offset).limit(page_size)
 | ||
|     
 | ||
|     result = await db.execute(query)
 | ||
|     rows = result.all()
 | ||
|     
 | ||
|     # 格式化数据
 | ||
|     data = []
 | ||
|     for row in rows:
 | ||
|         if detail:
 | ||
|             data.append({
 | ||
|                 "info": row[0].__dict__,
 | ||
|                 "metric": row[1].__dict__ if len(row) > 1 else None,
 | ||
|                 "rating": row[2].__dict__ if len(row) > 2 else None
 | ||
|             })
 | ||
|         else:
 | ||
|             data.append(row[0].__dict__)
 | ||
|     
 | ||
|     return ApiResponse(
 | ||
|         success=True,
 | ||
|         data=data,
 | ||
|         total=total_count,
 | ||
|         limit=page_size
 | ||
|     )
 | ||
| 
 | ||
| @router.get("/metrics/{pkg_name}")
 | ||
| async def get_app_metrics_history(
 | ||
|     pkg_name: str,
 | ||
|     db: AsyncSession = Depends(get_db)
 | ||
| ):
 | ||
|     """获取应用指标历史"""
 | ||
|     result = await db.execute(
 | ||
|         select(AppMetrics)
 | ||
|         .where(AppMetrics.pkg_name == pkg_name)
 | ||
|         .order_by(AppMetrics.created_at.desc())
 | ||
|     )
 | ||
|     metrics = result.scalars().all()
 | ||
|     
 | ||
|     return ApiResponse(
 | ||
|         success=True,
 | ||
|         data=[m.__dict__ for m in metrics]
 | ||
|     )
 | ||
| ```
 | ||
| 
 | ||
| #### 5.2.8 排行榜API (api/rankings.py)
 | ||
| 
 | ||
| ```python
 | ||
| from fastapi import APIRouter, Depends, Query
 | ||
| from sqlalchemy.ext.asyncio import AsyncSession
 | ||
| from sqlalchemy import select, func, and_
 | ||
| from datetime import datetime, timedelta
 | ||
| from app.database import get_db
 | ||
| from app.models.app_info import AppInfo
 | ||
| from app.models.app_metrics import AppMetrics
 | ||
| from app.models.app_rating import AppRating
 | ||
| from app.schemas.response import ApiResponse
 | ||
| 
 | ||
| router = APIRouter(prefix="/rankings", tags=["排行榜"])
 | ||
| 
 | ||
| @router.get("/top-downloads")
 | ||
| async def get_top_downloads(
 | ||
|     limit: int = Query(10, le=100),
 | ||
|     exclude_pattern: str = Query(None),
 | ||
|     db: AsyncSession = Depends(get_db)
 | ||
| ):
 | ||
|     """下载量排行榜"""
 | ||
|     # 子查询:获取每个应用的最新指标
 | ||
|     subquery = (
 | ||
|         select(
 | ||
|             AppMetrics.app_id,
 | ||
|             func.max(AppMetrics.created_at).label('max_created_at')
 | ||
|         )
 | ||
|         .group_by(AppMetrics.app_id)
 | ||
|         .subquery()
 | ||
|     )
 | ||
|     
 | ||
|     # 主查询
 | ||
|     query = (
 | ||
|         select(AppInfo, AppMetrics)
 | ||
|         .join(AppMetrics, AppInfo.app_id == AppMetrics.app_id)
 | ||
|         .join(
 | ||
|             subquery,
 | ||
|             and_(
 | ||
|                 AppMetrics.app_id == subquery.c.app_id,
 | ||
|                 AppMetrics.created_at == subquery.c.max_created_at
 | ||
|             )
 | ||
|         )
 | ||
|         .order_by(AppMetrics.download_count.desc())
 | ||
|         .limit(limit)
 | ||
|     )
 | ||
|     
 | ||
|     # 排除模式
 | ||
|     if exclude_pattern:
 | ||
|         query = query.where(~AppInfo.pkg_name.like(f"%{exclude_pattern}%"))
 | ||
|     
 | ||
|     result = await db.execute(query)
 | ||
|     rows = result.all()
 | ||
|     
 | ||
|     data = [
 | ||
|         {
 | ||
|             "app_id": row[0].app_id,
 | ||
|             "name": row[0].name,
 | ||
|             "pkg_name": row[0].pkg_name,
 | ||
|             "developer_name": row[0].developer_name,
 | ||
|             "icon_url": row[0].icon_url,
 | ||
|             "download_count": row[1].download_count,
 | ||
|             "version": row[1].version
 | ||
|         }
 | ||
|         for row in rows
 | ||
|     ]
 | ||
|     
 | ||
|     return ApiResponse(success=True, data=data, limit=limit)
 | ||
| 
 | ||
| @router.get("/ratings")
 | ||
| async def get_top_ratings(
 | ||
|     limit: int = Query(10, le=100),
 | ||
|     db: AsyncSession = Depends(get_db)
 | ||
| ):
 | ||
|     """评分排行榜"""
 | ||
|     subquery = (
 | ||
|         select(
 | ||
|             AppRating.app_id,
 | ||
|             func.max(AppRating.created_at).label('max_created_at')
 | ||
|         )
 | ||
|         .group_by(AppRating.app_id)
 | ||
|         .subquery()
 | ||
|     )
 | ||
|     
 | ||
|     query = (
 | ||
|         select(AppInfo, AppRating)
 | ||
|         .join(AppRating, AppInfo.app_id == AppRating.app_id)
 | ||
|         .join(
 | ||
|             subquery,
 | ||
|             and_(
 | ||
|                 AppRating.app_id == subquery.c.app_id,
 | ||
|                 AppRating.created_at == subquery.c.max_created_at
 | ||
|             )
 | ||
|         )
 | ||
|         .where(AppRating.total_rating_count >= 100)  # 至少100个评分
 | ||
|         .order_by(AppRating.average_rating.desc())
 | ||
|         .limit(limit)
 | ||
|     )
 | ||
|     
 | ||
|     result = await db.execute(query)
 | ||
|     rows = result.all()
 | ||
|     
 | ||
|     data = [
 | ||
|         {
 | ||
|             "app_id": row[0].app_id,
 | ||
|             "name": row[0].name,
 | ||
|             "pkg_name": row[0].pkg_name,
 | ||
|             "developer_name": row[0].developer_name,
 | ||
|             "icon_url": row[0].icon_url,
 | ||
|             "average_rating": float(row[1].average_rating),
 | ||
|             "total_rating_count": row[1].total_rating_count
 | ||
|         }
 | ||
|         for row in rows
 | ||
|     ]
 | ||
|     
 | ||
|     return ApiResponse(success=True, data=data, limit=limit)
 | ||
| 
 | ||
| @router.get("/developers")
 | ||
| async def get_top_developers(
 | ||
|     limit: int = Query(10, le=100),
 | ||
|     db: AsyncSession = Depends(get_db)
 | ||
| ):
 | ||
|     """开发者排行榜(按应用数量)"""
 | ||
|     query = (
 | ||
|         select(
 | ||
|             AppInfo.developer_name,
 | ||
|             func.count(AppInfo.app_id).label('app_count'),
 | ||
|             func.sum(AppMetrics.download_count).label('total_downloads')
 | ||
|         )
 | ||
|         .join(AppMetrics, AppInfo.app_id == AppMetrics.app_id)
 | ||
|         .group_by(AppInfo.developer_name)
 | ||
|         .order_by(func.count(AppInfo.app_id).desc())
 | ||
|         .limit(limit)
 | ||
|     )
 | ||
|     
 | ||
|     result = await db.execute(query)
 | ||
|     rows = result.all()
 | ||
|     
 | ||
|     data = [
 | ||
|         {
 | ||
|             "developer_name": row[0],
 | ||
|             "app_count": row[1],
 | ||
|             "total_downloads": row[2] or 0
 | ||
|         }
 | ||
|         for row in rows
 | ||
|     ]
 | ||
|     
 | ||
|     return ApiResponse(success=True, data=data, limit=limit)
 | ||
| ```
 | ||
| 
 | ||
| 
 | ||
| #### 5.2.9 定时任务 (scheduler/tasks.py)
 | ||
| 
 | ||
| ```python
 | ||
| from apscheduler.schedulers.asyncio import AsyncIOScheduler
 | ||
| from apscheduler.triggers.interval import IntervalTrigger
 | ||
| from sqlalchemy.ext.asyncio import AsyncSession
 | ||
| from app.database import AsyncSessionLocal
 | ||
| from app.config import settings
 | ||
| from app.crawler.huawei_api import HuaweiAPI
 | ||
| from app.crawler.data_processor import DataProcessor
 | ||
| import asyncio
 | ||
| import random
 | ||
| 
 | ||
| class CrawlerScheduler:
 | ||
|     def __init__(self):
 | ||
|         self.scheduler = AsyncIOScheduler()
 | ||
|         self.is_running = False
 | ||
|     
 | ||
|     def start(self):
 | ||
|         """启动调度器"""
 | ||
|         # 添加定时任务
 | ||
|         self.scheduler.add_job(
 | ||
|             self.sync_all_apps,
 | ||
|             trigger=IntervalTrigger(seconds=settings.CRAWLER_INTERVAL),
 | ||
|             id='sync_all_apps',
 | ||
|             name='同步所有应用',
 | ||
|             replace_existing=True
 | ||
|         )
 | ||
|         
 | ||
|         self.scheduler.start()
 | ||
|         print(f"调度器已启动,同步间隔: {settings.CRAWLER_INTERVAL}秒")
 | ||
|     
 | ||
|     def stop(self):
 | ||
|         """停止调度器"""
 | ||
|         self.scheduler.shutdown()
 | ||
|         print("调度器已停止")
 | ||
|     
 | ||
|     async def sync_all_apps(self):
 | ||
|         """同步所有应用"""
 | ||
|         if self.is_running:
 | ||
|             print("上一次同步尚未完成,跳过本次同步")
 | ||
|             return
 | ||
|         
 | ||
|         self.is_running = True
 | ||
|         print(f"开始同步所有应用 - {datetime.now()}")
 | ||
|         
 | ||
|         try:
 | ||
|             async with AsyncSessionLocal() as db:
 | ||
|                 # 获取所有包名
 | ||
|                 from sqlalchemy import select
 | ||
|                 from app.models.app_info import AppInfo
 | ||
|                 
 | ||
|                 result = await db.execute(select(AppInfo.pkg_name))
 | ||
|                 pkg_names = [row[0] for row in result.all()]
 | ||
|                 
 | ||
|                 # 随机打乱顺序
 | ||
|                 random.shuffle(pkg_names)
 | ||
|                 
 | ||
|                 print(f"共需同步 {len(pkg_names)} 个应用")
 | ||
|                 
 | ||
|                 # 批量处理
 | ||
|                 api = HuaweiAPI()
 | ||
|                 processor = DataProcessor(db)
 | ||
|                 
 | ||
|                 total_processed = 0
 | ||
|                 total_inserted = 0
 | ||
|                 total_failed = 0
 | ||
|                 
 | ||
|                 for i in range(0, len(pkg_names), settings.CRAWLER_BATCH_SIZE):
 | ||
|                     batch = pkg_names[i:i + settings.CRAWLER_BATCH_SIZE]
 | ||
|                     
 | ||
|                     # 并发处理批次
 | ||
|                     tasks = [
 | ||
|                         self._sync_single_app(api, processor, pkg_name)
 | ||
|                         for pkg_name in batch
 | ||
|                     ]
 | ||
|                     
 | ||
|                     results = await asyncio.gather(*tasks, return_exceptions=True)
 | ||
|                     
 | ||
|                     # 统计结果
 | ||
|                     for result in results:
 | ||
|                         total_processed += 1
 | ||
|                         if isinstance(result, Exception):
 | ||
|                             total_failed += 1
 | ||
|                         elif result:
 | ||
|                             total_inserted += 1
 | ||
|                     
 | ||
|                     print(f"已处理 {total_processed}/{len(pkg_names)} 个应用")
 | ||
|                     
 | ||
|                     # 批次间延迟
 | ||
|                     await asyncio.sleep(0.5)
 | ||
|                 
 | ||
|                 await api.close()
 | ||
|                 
 | ||
|                 print(f"同步完成 - 处理: {total_processed}, 更新: {total_inserted}, 失败: {total_failed}")
 | ||
|         
 | ||
|         except Exception as e:
 | ||
|             print(f"同步失败: {e}")
 | ||
|         
 | ||
|         finally:
 | ||
|             self.is_running = False
 | ||
|     
 | ||
|     async def _sync_single_app(
 | ||
|         self,
 | ||
|         api: HuaweiAPI,
 | ||
|         processor: DataProcessor,
 | ||
|         pkg_name: str
 | ||
|     ) -> bool:
 | ||
|         """同步单个应用"""
 | ||
|         try:
 | ||
|             # 获取应用数据
 | ||
|             app_data = await api.get_app_info(pkg_name=pkg_name)
 | ||
|             rating_data = await api.get_app_rating(app_data['appId'])
 | ||
|             
 | ||
|             # 保存数据
 | ||
|             new_info, new_metric, new_rating = await processor.save_app_data(
 | ||
|                 app_data, rating_data
 | ||
|             )
 | ||
|             
 | ||
|             return new_info or new_metric or new_rating
 | ||
|         
 | ||
|         except Exception as e:
 | ||
|             print(f"同步 {pkg_name} 失败: {e}")
 | ||
|             return False
 | ||
| 
 | ||
| # 全局调度器实例
 | ||
| scheduler = CrawlerScheduler()
 | ||
| ```
 | ||
| 
 | ||
| #### 5.2.10 主应用 (main.py)
 | ||
| 
 | ||
| ```python
 | ||
| from fastapi import FastAPI
 | ||
| from fastapi.middleware.cors import CORSMiddleware
 | ||
| from contextlib import asynccontextmanager
 | ||
| from app.config import settings
 | ||
| from app.api import apps, rankings, charts, submit
 | ||
| from app.scheduler.tasks import scheduler
 | ||
| 
 | ||
| @asynccontextmanager
 | ||
| async def lifespan(app: FastAPI):
 | ||
|     """应用生命周期管理"""
 | ||
|     # 启动时
 | ||
|     print("应用启动中...")
 | ||
|     scheduler.start()
 | ||
|     yield
 | ||
|     # 关闭时
 | ||
|     print("应用关闭中...")
 | ||
|     scheduler.stop()
 | ||
| 
 | ||
| # 创建FastAPI应用
 | ||
| app = FastAPI(
 | ||
|     title=settings.API_TITLE,
 | ||
|     version=settings.API_VERSION,
 | ||
|     lifespan=lifespan
 | ||
| )
 | ||
| 
 | ||
| # CORS中间件
 | ||
| app.add_middleware(
 | ||
|     CORSMiddleware,
 | ||
|     allow_origins=settings.CORS_ORIGINS,
 | ||
|     allow_credentials=True,
 | ||
|     allow_methods=["*"],
 | ||
|     allow_headers=["*"],
 | ||
| )
 | ||
| 
 | ||
| # 注册路由
 | ||
| app.include_router(apps.router, prefix=settings.API_PREFIX)
 | ||
| app.include_router(rankings.router, prefix=settings.API_PREFIX)
 | ||
| app.include_router(charts.router, prefix=settings.API_PREFIX)
 | ||
| app.include_router(submit.router, prefix=settings.API_PREFIX)
 | ||
| 
 | ||
| @app.get("/")
 | ||
| async def root():
 | ||
|     return {"message": "华为应用市场数据API", "version": settings.API_VERSION}
 | ||
| 
 | ||
| @app.get("/health")
 | ||
| async def health_check():
 | ||
|     return {"status": "healthy"}
 | ||
| 
 | ||
| if __name__ == "__main__":
 | ||
|     import uvicorn
 | ||
|     uvicorn.run(
 | ||
|         "app.main:app",
 | ||
|         host="0.0.0.0",
 | ||
|         port=8000,
 | ||
|         reload=settings.DEBUG
 | ||
|     )
 | ||
| ```
 | ||
| 
 | ||
| ### 5.3 依赖文件 (requirements.txt)
 | ||
| 
 | ||
| ```txt
 | ||
| fastapi==0.109.0
 | ||
| uvicorn[standard]==0.27.0
 | ||
| sqlalchemy==2.0.25
 | ||
| aiomysql==0.2.0
 | ||
| pydantic==2.5.3
 | ||
| pydantic-settings==2.1.0
 | ||
| httpx==0.26.0
 | ||
| playwright==1.41.0
 | ||
| apscheduler==3.10.4
 | ||
| python-dotenv==1.0.0
 | ||
| python-multipart==0.0.6
 | ||
| ```
 | ||
| 
 | ||
| ### 5.4 环境配置 (.env.example)
 | ||
| 
 | ||
| ```env
 | ||
| # 数据库配置
 | ||
| MYSQL_HOST=localhost
 | ||
| MYSQL_PORT=3306
 | ||
| MYSQL_USER=root
 | ||
| MYSQL_PASSWORD=your_password
 | ||
| MYSQL_DATABASE=huawei_market
 | ||
| 
 | ||
| # 华为API配置
 | ||
| HUAWEI_API_BASE_URL=https://web-drcn.hispace.dbankcloud.com/edge
 | ||
| HUAWEI_LOCALE=zh_CN
 | ||
| 
 | ||
| # 爬虫配置
 | ||
| CRAWLER_INTERVAL=1800
 | ||
| CRAWLER_BATCH_SIZE=100
 | ||
| CRAWLER_TIMEOUT=30
 | ||
| 
 | ||
| # API配置
 | ||
| API_PREFIX=/api
 | ||
| API_TITLE=华为应用市场数据API
 | ||
| API_VERSION=1.0.0
 | ||
| 
 | ||
| # 其他配置
 | ||
| DEBUG=False
 | ||
| CORS_ORIGINS=["http://localhost:5173","http://localhost:3000"]
 | ||
| ```
 | ||
| 
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 6. 前端开发
 | ||
| 
 | ||
| ### 6.1 项目结构
 | ||
| 
 | ||
| ```
 | ||
| frontend/
 | ||
| ├── public/
 | ||
| │   └── favicon.ico
 | ||
| ├── src/
 | ||
| │   ├── assets/              # 静态资源
 | ||
| │   │   ├── styles/
 | ||
| │   │   │   └── main.css
 | ||
| │   │   └── images/
 | ||
| │   ├── components/          # 组件
 | ||
| │   │   ├── AppCard.vue
 | ||
| │   │   ├── AppTable.vue
 | ||
| │   │   ├── ChartCard.vue
 | ||
| │   │   ├── StatCard.vue
 | ||
| │   │   └── SearchBar.vue
 | ||
| │   ├── views/               # 页面
 | ||
| │   │   ├── Dashboard.vue
 | ||
| │   │   ├── AppDetail.vue
 | ||
| │   │   └── Rankings.vue
 | ||
| │   ├── api/                 # API封装
 | ||
| │   │   ├── index.ts
 | ||
| │   │   └── apps.ts
 | ||
| │   ├── stores/              # 状态管理
 | ||
| │   │   └── app.ts
 | ||
| │   ├── types/               # 类型定义
 | ||
| │   │   └── app.ts
 | ||
| │   ├── utils/               # 工具函数
 | ||
| │   │   └── format.ts
 | ||
| │   ├── router/              # 路由
 | ||
| │   │   └── index.ts
 | ||
| │   ├── App.vue
 | ||
| │   └── main.ts
 | ||
| ├── index.html
 | ||
| ├── package.json
 | ||
| ├── tsconfig.json
 | ||
| ├── vite.config.ts
 | ||
| └── README.md
 | ||
| ```
 | ||
| 
 | ||
| ### 6.2 核心代码实现
 | ||
| 
 | ||
| #### 6.2.1 类型定义 (types/app.ts)
 | ||
| 
 | ||
| ```typescript
 | ||
| export interface AppInfo {
 | ||
|   app_id: string
 | ||
|   name: string
 | ||
|   pkg_name: string
 | ||
|   developer_name: string
 | ||
|   dev_en_name?: string
 | ||
|   kind_name: string
 | ||
|   kind_type_name: string
 | ||
|   icon_url: string
 | ||
|   brief_desc: string
 | ||
|   description: string
 | ||
|   privacy_url: string
 | ||
|   iap: boolean
 | ||
|   is_pay: boolean
 | ||
|   listed_at: string
 | ||
|   created_at: string
 | ||
| }
 | ||
| 
 | ||
| export interface AppMetric {
 | ||
|   id: number
 | ||
|   app_id: string
 | ||
|   pkg_name: string
 | ||
|   version: string
 | ||
|   version_code: number
 | ||
|   size_bytes: number
 | ||
|   download_count: number
 | ||
|   info_score: number
 | ||
|   info_rate_count: number
 | ||
|   price: number
 | ||
|   release_date: number
 | ||
|   target_sdk: string
 | ||
|   min_sdk: string
 | ||
|   created_at: string
 | ||
| }
 | ||
| 
 | ||
| export interface AppRating {
 | ||
|   id: number
 | ||
|   app_id: string
 | ||
|   average_rating: number
 | ||
|   star_1_count: number
 | ||
|   star_2_count: number
 | ||
|   star_3_count: number
 | ||
|   star_4_count: number
 | ||
|   star_5_count: number
 | ||
|   total_rating_count: number
 | ||
|   created_at: string
 | ||
| }
 | ||
| 
 | ||
| export interface FullAppInfo {
 | ||
|   info: AppInfo
 | ||
|   metric: AppMetric
 | ||
|   rating?: AppRating
 | ||
| }
 | ||
| 
 | ||
| export interface ApiResponse<T = any> {
 | ||
|   success: boolean
 | ||
|   data: T
 | ||
|   total?: number
 | ||
|   limit?: number
 | ||
|   timestamp: string
 | ||
| }
 | ||
| 
 | ||
| export interface MarketStats {
 | ||
|   app_count: {
 | ||
|     total: number
 | ||
|     apps: number
 | ||
|     atomic_services: number
 | ||
|   }
 | ||
|   developer_count: number
 | ||
| }
 | ||
| 
 | ||
| export interface RankingItem {
 | ||
|   app_id: string
 | ||
|   name: string
 | ||
|   pkg_name: string
 | ||
|   developer_name: string
 | ||
|   icon_url: string
 | ||
|   download_count?: number
 | ||
|   average_rating?: number
 | ||
|   total_rating_count?: number
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| #### 6.2.2 API封装 (api/apps.ts)
 | ||
| 
 | ||
| ```typescript
 | ||
| import axios from 'axios'
 | ||
| import type { ApiResponse, FullAppInfo, MarketStats, RankingItem } from '@/types/app'
 | ||
| 
 | ||
| const api = axios.create({
 | ||
|   baseURL: import.meta.env.VITE_API_BASE_URL || 'http://localhost:8000/api',
 | ||
|   timeout: 30000
 | ||
| })
 | ||
| 
 | ||
| // 请求拦截器
 | ||
| api.interceptors.request.use(
 | ||
|   config => {
 | ||
|     // 可以在这里添加token等
 | ||
|     return config
 | ||
|   },
 | ||
|   error => {
 | ||
|     return Promise.reject(error)
 | ||
|   }
 | ||
| )
 | ||
| 
 | ||
| // 响应拦截器
 | ||
| api.interceptors.response.use(
 | ||
|   response => {
 | ||
|     return response.data
 | ||
|   },
 | ||
|   error => {
 | ||
|     console.error('API Error:', error)
 | ||
|     return Promise.reject(error)
 | ||
|   }
 | ||
| )
 | ||
| 
 | ||
| export const appsApi = {
 | ||
|   // 获取市场统计信息
 | ||
|   getMarketInfo: () => 
 | ||
|     api.get<any, ApiResponse<MarketStats>>('/market_info'),
 | ||
|   
 | ||
|   // 按包名查询应用
 | ||
|   getAppByPkgName: (pkgName: string) =>
 | ||
|     api.get<any, ApiResponse<FullAppInfo>>(`/apps/pkg_name/${pkgName}`),
 | ||
|   
 | ||
|   // 按应用ID查询
 | ||
|   getAppById: (appId: string) =>
 | ||
|     api.get<any, ApiResponse<FullAppInfo>>(`/apps/app_id/${appId}`),
 | ||
|   
 | ||
|   // 获取应用列表
 | ||
|   getAppList: (params: {
 | ||
|     page: number
 | ||
|     page_size?: number
 | ||
|     detail?: boolean
 | ||
|     sort?: string
 | ||
|     desc?: boolean
 | ||
|     search_key?: string
 | ||
|     search_value?: string
 | ||
|     search_exact?: boolean
 | ||
|   }) =>
 | ||
|     api.get<any, ApiResponse<FullAppInfo[]>>(`/apps/list/${params.page}`, { params }),
 | ||
|   
 | ||
|   // 获取应用指标历史
 | ||
|   getAppMetrics: (pkgName: string) =>
 | ||
|     api.get<any, ApiResponse<any[]>>(`/apps/metrics/${pkgName}`),
 | ||
|   
 | ||
|   // 获取下载排行
 | ||
|   getTopDownloads: (params?: { limit?: number; exclude_pattern?: string }) =>
 | ||
|     api.get<any, ApiResponse<RankingItem[]>>('/rankings/top-downloads', { params }),
 | ||
|   
 | ||
|   // 获取评分排行
 | ||
|   getTopRatings: (params?: { limit?: number }) =>
 | ||
|     api.get<any, ApiResponse<RankingItem[]>>('/rankings/ratings', { params }),
 | ||
|   
 | ||
|   // 获取开发者排行
 | ||
|   getTopDevelopers: (params?: { limit?: number }) =>
 | ||
|     api.get<any, ApiResponse<any[]>>('/rankings/developers', { params }),
 | ||
|   
 | ||
|   // 获取评分分布
 | ||
|   getRatingDistribution: () =>
 | ||
|     api.get<any, ApiResponse<Record<string, number>>>('/charts/rating'),
 | ||
|   
 | ||
|   // 获取SDK分布
 | ||
|   getMinSdkDistribution: () =>
 | ||
|     api.get<any, ApiResponse<Record<string, number>>>('/charts/min_sdk'),
 | ||
|   
 | ||
|   getTargetSdkDistribution: () =>
 | ||
|     api.get<any, ApiResponse<Record<string, number>>>('/charts/target_sdk'),
 | ||
|   
 | ||
|   // 投稿应用
 | ||
|   submitApp: (data: {
 | ||
|     pkg_name?: string
 | ||
|     app_id?: string
 | ||
|     comment?: any
 | ||
|   }) =>
 | ||
|     api.post<any, ApiResponse<any>>('/submit', data)
 | ||
| }
 | ||
| 
 | ||
| export default api
 | ||
| ```
 | ||
| 
 | ||
| #### 6.2.3 状态管理 (stores/app.ts)
 | ||
| 
 | ||
| ```typescript
 | ||
| import { defineStore } from 'pinia'
 | ||
| import { ref, computed } from 'vue'
 | ||
| import { appsApi } from '@/api/apps'
 | ||
| import type { MarketStats, FullAppInfo } from '@/types/app'
 | ||
| 
 | ||
| export const useAppStore = defineStore('app', () => {
 | ||
|   // 状态
 | ||
|   const marketStats = ref<MarketStats | null>(null)
 | ||
|   const appList = ref<FullAppInfo[]>([])
 | ||
|   const currentPage = ref(1)
 | ||
|   const pageSize = ref(100)
 | ||
|   const totalCount = ref(0)
 | ||
|   const loading = ref(false)
 | ||
|   
 | ||
|   // 计算属性
 | ||
|   const totalPages = computed(() => Math.ceil(totalCount.value / pageSize.value))
 | ||
|   
 | ||
|   // 方法
 | ||
|   const fetchMarketStats = async () => {
 | ||
|     try {
 | ||
|       const response = await appsApi.getMarketInfo()
 | ||
|       if (response.success) {
 | ||
|         marketStats.value = response.data
 | ||
|       }
 | ||
|     } catch (error) {
 | ||
|       console.error('获取市场统计失败:', error)
 | ||
|     }
 | ||
|   }
 | ||
|   
 | ||
|   const fetchAppList = async (params: {
 | ||
|     page?: number
 | ||
|     page_size?: number
 | ||
|     sort?: string
 | ||
|     desc?: boolean
 | ||
|     search_key?: string
 | ||
|     search_value?: string
 | ||
|     search_exact?: boolean
 | ||
|   } = {}) => {
 | ||
|     loading.value = true
 | ||
|     try {
 | ||
|       const response = await appsApi.getAppList({
 | ||
|         page: params.page || currentPage.value,
 | ||
|         page_size: params.page_size || pageSize.value,
 | ||
|         detail: true,
 | ||
|         ...params
 | ||
|       })
 | ||
|       
 | ||
|       if (response.success) {
 | ||
|         appList.value = response.data
 | ||
|         totalCount.value = response.total || 0
 | ||
|         currentPage.value = params.page || currentPage.value
 | ||
|       }
 | ||
|     } catch (error) {
 | ||
|       console.error('获取应用列表失败:', error)
 | ||
|     } finally {
 | ||
|       loading.value = false
 | ||
|     }
 | ||
|   }
 | ||
|   
 | ||
|   const searchApps = async (searchKey: string, searchValue: string, exact: boolean = false) => {
 | ||
|     await fetchAppList({
 | ||
|       page: 1,
 | ||
|       search_key: searchKey,
 | ||
|       search_value: searchValue,
 | ||
|       search_exact: exact
 | ||
|     })
 | ||
|   }
 | ||
|   
 | ||
|   return {
 | ||
|     marketStats,
 | ||
|     appList,
 | ||
|     currentPage,
 | ||
|     pageSize,
 | ||
|     totalCount,
 | ||
|     totalPages,
 | ||
|     loading,
 | ||
|     fetchMarketStats,
 | ||
|     fetchAppList,
 | ||
|     searchApps
 | ||
|   }
 | ||
| })
 | ||
| ```
 | ||
| 
 | ||
| #### 6.2.4 工具函数 (utils/format.ts)
 | ||
| 
 | ||
| ```typescript
 | ||
| /**
 | ||
|  * 格式化文件大小
 | ||
|  */
 | ||
| export function formatFileSize(bytes: number): string {
 | ||
|   if (bytes === 0) return '0 B'
 | ||
|   const k = 1024
 | ||
|   const sizes = ['B', 'KB', 'MB', 'GB', 'TB']
 | ||
|   const i = Math.floor(Math.log(bytes) / Math.log(k))
 | ||
|   return Math.round(bytes / Math.pow(k, i) * 100) / 100 + ' ' + sizes[i]
 | ||
| }
 | ||
| 
 | ||
| /**
 | ||
|  * 格式化下载量
 | ||
|  */
 | ||
| export function formatDownloadCount(count: number): string {
 | ||
|   if (count >= 100000000) {
 | ||
|     return (count / 100000000).toFixed(1) + '亿'
 | ||
|   } else if (count >= 10000) {
 | ||
|     return (count / 10000).toFixed(1) + '万'
 | ||
|   }
 | ||
|   return count.toString()
 | ||
| }
 | ||
| 
 | ||
| /**
 | ||
|  * 格式化日期
 | ||
|  */
 | ||
| export function formatDate(date: string | number): string {
 | ||
|   const d = new Date(date)
 | ||
|   return d.toLocaleDateString('zh-CN', {
 | ||
|     year: 'numeric',
 | ||
|     month: '2-digit',
 | ||
|     day: '2-digit',
 | ||
|     hour: '2-digit',
 | ||
|     minute: '2-digit'
 | ||
|   })
 | ||
| }
 | ||
| 
 | ||
| /**
 | ||
|  * 格式化评分
 | ||
|  */
 | ||
| export function formatRating(rating: number): string {
 | ||
|   return rating.toFixed(1)
 | ||
| }
 | ||
| 
 | ||
| /**
 | ||
|  * 获取星级数组
 | ||
|  */
 | ||
| export function getStarArray(rating: number): boolean[] {
 | ||
|   const fullStars = Math.floor(rating)
 | ||
|   const hasHalfStar = rating % 1 >= 0.5
 | ||
|   const stars: boolean[] = []
 | ||
|   
 | ||
|   for (let i = 0; i < 5; i++) {
 | ||
|     stars.push(i < fullStars || (i === fullStars && hasHalfStar))
 | ||
|   }
 | ||
|   
 | ||
|   return stars
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 附录A:如何获取应用包名
 | ||
| 
 | ||
| ### A.1 从华为应用市场网页获取
 | ||
| 
 | ||
| #### 方法1:从URL中提取
 | ||
| 
 | ||
| 访问华为应用市场应用详情页,URL格式如下:
 | ||
| 
 | ||
| ```
 | ||
| https://appgallery.huawei.com/app/C1164531384803416384
 | ||
| ```
 | ||
| 
 | ||
| 或者:
 | ||
| 
 | ||
| ```
 | ||
| https://appgallery.huawei.com/#/app/C1164531384803416384
 | ||
| ```
 | ||
| 
 | ||
| **注意:** URL中的是 `app_id`,不是包名。需要进一步获取包名。
 | ||
| 
 | ||
| #### 方法2:从网页源码中提取
 | ||
| 
 | ||
| 1. 打开应用详情页
 | ||
| 2. 右键 -> 查看网页源代码
 | ||
| 3. 搜索 `"pkgName"` 或 `"packageName"`
 | ||
| 4. 找到类似这样的内容:
 | ||
| 
 | ||
| ```json
 | ||
| {
 | ||
|   "pkgName": "com.huawei.hmsapp.appgallery",
 | ||
|   "appId": "C1164531384803416384",
 | ||
|   ...
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| #### 方法3:使用浏览器开发者工具
 | ||
| 
 | ||
| 1. 打开应用详情页
 | ||
| 2. 按 F12 打开开发者工具
 | ||
| 3. 切换到 Network(网络)标签
 | ||
| 4. 刷新页面
 | ||
| 5. 筛选 XHR 请求,找到 `appinfo` 相关的请求
 | ||
| 6. 查看请求的 Response,找到 `pkgName` 字段
 | ||
| 
 | ||
| **示例截图说明:**
 | ||
| ```
 | ||
| Network -> XHR -> appinfo
 | ||
| Response:
 | ||
| {
 | ||
|   "pkgName": "com.huawei.hmsapp.appgallery",
 | ||
|   "name": "应用市场",
 | ||
|   ...
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| ### A.2 从安卓设备获取
 | ||
| 
 | ||
| #### 方法1:使用 ADB 命令
 | ||
| 
 | ||
| 如果你有安卓设备或模拟器:
 | ||
| 
 | ||
| ```bash
 | ||
| # 列出所有已安装应用的包名
 | ||
| adb shell pm list packages
 | ||
| 
 | ||
| # 列出第三方应用
 | ||
| adb shell pm list packages -3
 | ||
| 
 | ||
| # 搜索特定应用(例如包含 huawei 的)
 | ||
| adb shell pm list packages | grep huawei
 | ||
| 
 | ||
| # 获取当前运行应用的包名
 | ||
| adb shell dumpsys window | grep mCurrentFocus
 | ||
| ```
 | ||
| 
 | ||
| **输出示例:**
 | ||
| ```
 | ||
| package:com.huawei.hmsapp.appgallery
 | ||
| package:com.huawei.browser
 | ||
| package:com.huawei.music
 | ||
| ```
 | ||
| 
 | ||
| #### 方法2:使用应用信息查看器
 | ||
| 
 | ||
| 在安卓设备上安装 "应用信息查看器" 类的应用,例如:
 | ||
| - **Package Name Viewer**
 | ||
| - **App Inspector**
 | ||
| - **Dev Tools**
 | ||
| 
 | ||
| 这些应用可以直接显示已安装应用的包名。
 | ||
| 
 | ||
| ### A.3 批量获取包名的方法
 | ||
| 
 | ||
| #### 方法1:爬取华为应用市场分类页
 | ||
| 
 | ||
| ```python
 | ||
| import httpx
 | ||
| from bs4 import BeautifulSoup
 | ||
| 
 | ||
| async def get_apps_from_category(category_id: str):
 | ||
|     """从分类页获取应用列表"""
 | ||
|     url = f"https://appgallery.huawei.com/Featured/{category_id}"
 | ||
|     
 | ||
|     async with httpx.AsyncClient() as client:
 | ||
|         response = await client.get(url)
 | ||
|         soup = BeautifulSoup(response.text, 'html.parser')
 | ||
|         
 | ||
|         # 查找应用链接
 | ||
|         app_links = soup.find_all('a', href=True)
 | ||
|         app_ids = []
 | ||
|         
 | ||
|         for link in app_links:
 | ||
|             href = link['href']
 | ||
|             if '/app/' in href:
 | ||
|                 app_id = href.split('/app/')[-1]
 | ||
|                 app_ids.append(app_id)
 | ||
|         
 | ||
|         return app_ids
 | ||
| 
 | ||
| # 使用示例
 | ||
| app_ids = await get_apps_from_category('10000000')  # 工具分类
 | ||
| ```
 | ||
| 
 | ||
| #### 方法2:通过应用ID猜测
 | ||
| 
 | ||
| 华为应用的 app_id 格式为:`C` + 19位数字
 | ||
| 
 | ||
| 可以通过遍历数字范围来发现应用:
 | ||
| 
 | ||
| ```python
 | ||
| async def guess_app_ids(start: int, end: int):
 | ||
|     """猜测应用ID"""
 | ||
|     api = HuaweiAPI()
 | ||
|     found_apps = []
 | ||
|     
 | ||
|     for i in range(start, end):
 | ||
|         app_id = f"C{i:019d}"
 | ||
|         try:
 | ||
|             app_data = await api.get_app_info(app_id=app_id)
 | ||
|             found_apps.append({
 | ||
|                 'app_id': app_id,
 | ||
|                 'pkg_name': app_data['pkgName'],
 | ||
|                 'name': app_data['name']
 | ||
|             })
 | ||
|             print(f"找到应用: {app_data['name']} ({app_data['pkgName']})")
 | ||
|         except:
 | ||
|             pass
 | ||
|     
 | ||
|     return found_apps
 | ||
| 
 | ||
| # 使用示例
 | ||
| apps = await guess_app_ids(1164531384803416384, 1164531384803416484)
 | ||
| ```
 | ||
| 
 | ||
| #### 方法3:从已有数据库扩展
 | ||
| 
 | ||
| 如果已经有一些应用数据,可以通过以下方式扩展:
 | ||
| 
 | ||
| 1. **同开发者的其他应用**
 | ||
|    ```sql
 | ||
|    SELECT DISTINCT pkg_name 
 | ||
|    FROM app_info 
 | ||
|    WHERE developer_name = '华为软件技术有限公司'
 | ||
|    ```
 | ||
| 
 | ||
| 2. **同分类的应用**
 | ||
|    ```sql
 | ||
|    SELECT DISTINCT pkg_name 
 | ||
|    FROM app_info 
 | ||
|    WHERE kind_name = '工具'
 | ||
|    ```
 | ||
| 
 | ||
| 3. **相关推荐应用**
 | ||
|    - 访问应用详情页,查看"相关推荐"部分
 | ||
|    - 提取推荐应用的 app_id
 | ||
| 
 | ||
| ### A.4 常见应用包名示例
 | ||
| 
 | ||
| ```python
 | ||
| # 华为系统应用
 | ||
| HUAWEI_SYSTEM_APPS = [
 | ||
|     "com.huawei.hmsapp.appgallery",      # 应用市场
 | ||
|     "com.huawei.browser",                 # 浏览器
 | ||
|     "com.huawei.music",                   # 音乐
 | ||
|     "com.huawei.himovie",                 # 视频
 | ||
|     "com.huawei.camera",                  # 相机
 | ||
|     "com.huawei.health",                  # 运动健康
 | ||
|     "com.huawei.wallet",                  # 钱包
 | ||
| ]
 | ||
| 
 | ||
| # 热门第三方应用
 | ||
| POPULAR_APPS = [
 | ||
|     "com.tencent.mm",                     # 微信
 | ||
|     "com.tencent.mobileqq",               # QQ
 | ||
|     "com.sina.weibo",                     # 微博
 | ||
|     "com.taobao.taobao",                  # 淘宝
 | ||
|     "com.jingdong.app.mall",              # 京东
 | ||
|     "com.ss.android.ugc.aweme",           # 抖音
 | ||
| ]
 | ||
| 
 | ||
| # 鸿蒙元服务(包名特征)
 | ||
| ATOMIC_SERVICE_PATTERN = "com.atomicservice.*"
 | ||
| ```
 | ||
| 
 | ||
| ### A.5 包名命名规范
 | ||
| 
 | ||
| 包名通常遵循以下规范:
 | ||
| 
 | ||
| **格式:** `com.公司名.应用名`
 | ||
| 
 | ||
| **示例:**
 | ||
| - `com.huawei.hmsapp.appgallery` - 华为应用市场
 | ||
| - `com.tencent.mm` - 腾讯微信
 | ||
| - `com.alibaba.android.rimet` - 阿里钉钉
 | ||
| 
 | ||
| **鸿蒙元服务:**
 | ||
| - `com.atomicservice.{19位数字}` - 元服务包名格式
 | ||
| 
 | ||
| ### A.6 实用工具脚本
 | ||
| 
 | ||
| #### 从URL批量提取包名
 | ||
| 
 | ||
| ```python
 | ||
| import re
 | ||
| import httpx
 | ||
| from typing import List
 | ||
| 
 | ||
| async def extract_pkg_names_from_urls(urls: List[str]) -> List[dict]:
 | ||
|     """从URL列表批量提取包名"""
 | ||
|     api = HuaweiAPI()
 | ||
|     results = []
 | ||
|     
 | ||
|     for url in urls:
 | ||
|         # 从URL提取app_id
 | ||
|         match = re.search(r'/app/([A-Z0-9]+)', url)
 | ||
|         if not match:
 | ||
|             continue
 | ||
|         
 | ||
|         app_id = match.group(1)
 | ||
|         
 | ||
|         try:
 | ||
|             app_data = await api.get_app_info(app_id=app_id)
 | ||
|             results.append({
 | ||
|                 'url': url,
 | ||
|                 'app_id': app_id,
 | ||
|                 'pkg_name': app_data['pkgName'],
 | ||
|                 'name': app_data['name']
 | ||
|             })
 | ||
|         except Exception as e:
 | ||
|             print(f"处理 {url} 失败: {e}")
 | ||
|     
 | ||
|     return results
 | ||
| 
 | ||
| # 使用示例
 | ||
| urls = [
 | ||
|     "https://appgallery.huawei.com/app/C1164531384803416384",
 | ||
|     "https://appgallery.huawei.com/app/C100000000000000001",
 | ||
| ]
 | ||
| 
 | ||
| results = await extract_pkg_names_from_urls(urls)
 | ||
| for r in results:
 | ||
|     print(f"{r['name']}: {r['pkg_name']}")
 | ||
| ```
 | ||
| 
 | ||
| #### 导出包名列表
 | ||
| 
 | ||
| ```python
 | ||
| import csv
 | ||
| from sqlalchemy import select
 | ||
| from app.models.app_info import AppInfo
 | ||
| 
 | ||
| async def export_pkg_names_to_csv(db: AsyncSession, filename: str = "pkg_names.csv"):
 | ||
|     """导出所有包名到CSV文件"""
 | ||
|     result = await db.execute(
 | ||
|         select(AppInfo.pkg_name, AppInfo.name, AppInfo.developer_name)
 | ||
|         .order_by(AppInfo.name)
 | ||
|     )
 | ||
|     
 | ||
|     with open(filename, 'w', newline='', encoding='utf-8') as f:
 | ||
|         writer = csv.writer(f)
 | ||
|         writer.writerow(['包名', '应用名称', '开发者'])
 | ||
|         
 | ||
|         for row in result:
 | ||
|             writer.writerow([row.pkg_name, row.name, row.developer_name])
 | ||
|     
 | ||
|     print(f"已导出到 {filename}")
 | ||
| ```
 | ||
| 
 | ||
| ### A.7 注意事项
 | ||
| 
 | ||
| 1. **包名唯一性**
 | ||
|    - 每个应用的包名在华为应用市场中是唯一的
 | ||
|    - 同一个应用在不同应用市场的包名相同
 | ||
| 
 | ||
| 2. **包名格式验证**
 | ||
|    ```python
 | ||
|    import re
 | ||
|    
 | ||
|    def is_valid_pkg_name(pkg_name: str) -> bool:
 | ||
|        """验证包名格式"""
 | ||
|        pattern = r'^[a-z][a-z0-9_]*(\.[a-z][a-z0-9_]*)+$'
 | ||
|        return bool(re.match(pattern, pkg_name))
 | ||
|    
 | ||
|    # 示例
 | ||
|    print(is_valid_pkg_name("com.huawei.hmsapp.appgallery"))  # True
 | ||
|    print(is_valid_pkg_name("Com.Huawei.App"))                # False (大写)
 | ||
|    print(is_valid_pkg_name("huawei.app"))                    # False (少于2段)
 | ||
|    ```
 | ||
| 
 | ||
| 3. **元服务识别**
 | ||
|    ```python
 | ||
|    def is_atomic_service(pkg_name: str) -> bool:
 | ||
|        """判断是否为元服务"""
 | ||
|        return pkg_name.startswith("com.atomicservice.")
 | ||
|    ```
 | ||
| 
 | ||
| 4. **获取频率限制**
 | ||
|    - 避免过于频繁的请求
 | ||
|    - 建议添加延迟:每次请求间隔 0.5-1 秒
 | ||
|    - 使用批量处理时注意并发数量
 | ||
| 
 | ||
| 5. **数据更新策略**
 | ||
|    - 优先更新下载量高的应用
 | ||
|    - 定期全量同步所有已知包名
 | ||
|    - 新发现的包名及时入库
 | ||
| 
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 7. 部署指南
 | ||
| 
 | ||
| ### 7.1 Docker 部署
 | ||
| 
 | ||
| #### 7.1.1 后端 Dockerfile
 | ||
| 
 | ||
| ```dockerfile
 | ||
| # backend/Dockerfile
 | ||
| FROM python:3.11-slim
 | ||
| 
 | ||
| WORKDIR /app
 | ||
| 
 | ||
| # 安装系统依赖
 | ||
| RUN apt-get update && apt-get install -y \
 | ||
|     gcc \
 | ||
|     default-libmysqlclient-dev \
 | ||
|     pkg-config \
 | ||
|     && rm -rf /var/lib/apt/lists/*
 | ||
| 
 | ||
| # 安装 Playwright 依赖
 | ||
| RUN apt-get update && apt-get install -y \
 | ||
|     libnss3 \
 | ||
|     libnspr4 \
 | ||
|     libatk1.0-0 \
 | ||
|     libatk-bridge2.0-0 \
 | ||
|     libcups2 \
 | ||
|     libdrm2 \
 | ||
|     libxkbcommon0 \
 | ||
|     libxcomposite1 \
 | ||
|     libxdamage1 \
 | ||
|     libxfixes3 \
 | ||
|     libxrandr2 \
 | ||
|     libgbm1 \
 | ||
|     libasound2
 | ||
| 
 | ||
| # 复制依赖文件
 | ||
| COPY requirements.txt .
 | ||
| 
 | ||
| # 安装 Python 依赖
 | ||
| RUN pip install --no-cache-dir -r requirements.txt
 | ||
| 
 | ||
| # 安装 Playwright 浏览器
 | ||
| RUN playwright install chromium
 | ||
| 
 | ||
| # 复制应用代码
 | ||
| COPY . .
 | ||
| 
 | ||
| # 暴露端口
 | ||
| EXPOSE 8000
 | ||
| 
 | ||
| # 启动命令
 | ||
| CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
 | ||
| ```
 | ||
| 
 | ||
| #### 7.1.2 前端 Dockerfile
 | ||
| 
 | ||
| ```dockerfile
 | ||
| # frontend/Dockerfile
 | ||
| FROM node:18-alpine as builder
 | ||
| 
 | ||
| WORKDIR /app
 | ||
| 
 | ||
| # 复制依赖文件
 | ||
| COPY package*.json ./
 | ||
| 
 | ||
| # 安装依赖
 | ||
| RUN npm ci
 | ||
| 
 | ||
| # 复制源代码
 | ||
| COPY . .
 | ||
| 
 | ||
| # 构建
 | ||
| RUN npm run build
 | ||
| 
 | ||
| # 生产环境
 | ||
| FROM nginx:alpine
 | ||
| 
 | ||
| # 复制构建产物
 | ||
| COPY --from=builder /app/dist /usr/share/nginx/html
 | ||
| 
 | ||
| # 复制 Nginx 配置
 | ||
| COPY nginx.conf /etc/nginx/conf.d/default.conf
 | ||
| 
 | ||
| EXPOSE 80
 | ||
| 
 | ||
| CMD ["nginx", "-g", "daemon off;"]
 | ||
| ```
 | ||
| 
 | ||
| #### 7.1.3 Nginx 配置
 | ||
| 
 | ||
| ```nginx
 | ||
| # frontend/nginx.conf
 | ||
| server {
 | ||
|     listen 80;
 | ||
|     server_name localhost;
 | ||
|     
 | ||
|     root /usr/share/nginx/html;
 | ||
|     index index.html;
 | ||
|     
 | ||
|     # Gzip 压缩
 | ||
|     gzip on;
 | ||
|     gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
 | ||
|     
 | ||
|     # 前端路由
 | ||
|     location / {
 | ||
|         try_files $uri $uri/ /index.html;
 | ||
|     }
 | ||
|     
 | ||
|     # API 代理
 | ||
|     location /api {
 | ||
|         proxy_pass http://backend:8000;
 | ||
|         proxy_set_header Host $host;
 | ||
|         proxy_set_header X-Real-IP $remote_addr;
 | ||
|         proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
 | ||
|         proxy_set_header X-Forwarded-Proto $scheme;
 | ||
|     }
 | ||
|     
 | ||
|     # 静态资源缓存
 | ||
|     location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$ {
 | ||
|         expires 1y;
 | ||
|         add_header Cache-Control "public, immutable";
 | ||
|     }
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| #### 7.1.4 Docker Compose
 | ||
| 
 | ||
| ```yaml
 | ||
| # docker-compose.yml
 | ||
| version: '3.8'
 | ||
| 
 | ||
| services:
 | ||
|   mysql:
 | ||
|     image: mysql:8.0
 | ||
|     container_name: huawei_market_mysql
 | ||
|     restart: always
 | ||
|     environment:
 | ||
|       MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD}
 | ||
|       MYSQL_DATABASE: ${MYSQL_DATABASE}
 | ||
|       MYSQL_USER: ${MYSQL_USER}
 | ||
|       MYSQL_PASSWORD: ${MYSQL_PASSWORD}
 | ||
|     ports:
 | ||
|       - "3306:3306"
 | ||
|     volumes:
 | ||
|       - mysql_data:/var/lib/mysql
 | ||
|       - ./backend/sql:/docker-entrypoint-initdb.d
 | ||
|     command: --default-authentication-plugin=mysql_native_password
 | ||
|     networks:
 | ||
|       - app_network
 | ||
| 
 | ||
|   backend:
 | ||
|     build:
 | ||
|       context: ./backend
 | ||
|       dockerfile: Dockerfile
 | ||
|     container_name: huawei_market_backend
 | ||
|     restart: always
 | ||
|     environment:
 | ||
|       MYSQL_HOST: mysql
 | ||
|       MYSQL_PORT: 3306
 | ||
|       MYSQL_USER: ${MYSQL_USER}
 | ||
|       MYSQL_PASSWORD: ${MYSQL_PASSWORD}
 | ||
|       MYSQL_DATABASE: ${MYSQL_DATABASE}
 | ||
|     ports:
 | ||
|       - "8000:8000"
 | ||
|     depends_on:
 | ||
|       - mysql
 | ||
|     volumes:
 | ||
|       - ./backend:/app
 | ||
|     networks:
 | ||
|       - app_network
 | ||
| 
 | ||
|   frontend:
 | ||
|     build:
 | ||
|       context: ./frontend
 | ||
|       dockerfile: Dockerfile
 | ||
|     container_name: huawei_market_frontend
 | ||
|     restart: always
 | ||
|     ports:
 | ||
|       - "80:80"
 | ||
|     depends_on:
 | ||
|       - backend
 | ||
|     networks:
 | ||
|       - app_network
 | ||
| 
 | ||
| volumes:
 | ||
|   mysql_data:
 | ||
| 
 | ||
| networks:
 | ||
|   app_network:
 | ||
|     driver: bridge
 | ||
| ```
 | ||
| 
 | ||
| #### 7.1.5 环境变量文件
 | ||
| 
 | ||
| ```env
 | ||
| # .env
 | ||
| MYSQL_ROOT_PASSWORD=root_password_here
 | ||
| MYSQL_DATABASE=huawei_market
 | ||
| MYSQL_USER=market_user
 | ||
| MYSQL_PASSWORD=user_password_here
 | ||
| ```
 | ||
| 
 | ||
| ### 7.2 部署步骤
 | ||
| 
 | ||
| #### 7.2.1 准备工作
 | ||
| 
 | ||
| ```bash
 | ||
| # 1. 克隆项目
 | ||
| git clone <your-repo-url>
 | ||
| cd huawei-market-crawler
 | ||
| 
 | ||
| # 2. 创建环境变量文件
 | ||
| cp .env.example .env
 | ||
| # 编辑 .env 文件,填入实际配置
 | ||
| 
 | ||
| # 3. 创建必要的目录
 | ||
| mkdir -p backend/logs
 | ||
| mkdir -p mysql_data
 | ||
| ```
 | ||
| 
 | ||
| #### 7.2.2 使用 Docker Compose 部署
 | ||
| 
 | ||
| ```bash
 | ||
| # 构建并启动所有服务
 | ||
| docker-compose up -d --build
 | ||
| 
 | ||
| # 查看服务状态
 | ||
| docker-compose ps
 | ||
| 
 | ||
| # 查看日志
 | ||
| docker-compose logs -f backend
 | ||
| 
 | ||
| # 停止服务
 | ||
| docker-compose down
 | ||
| 
 | ||
| # 停止并删除数据卷
 | ||
| docker-compose down -v
 | ||
| ```
 | ||
| 
 | ||
| #### 7.2.3 初始化数据库
 | ||
| 
 | ||
| ```bash
 | ||
| # 进入 MySQL 容器
 | ||
| docker exec -it huawei_market_mysql mysql -u root -p
 | ||
| 
 | ||
| # 执行初始化脚本
 | ||
| mysql> USE huawei_market;
 | ||
| mysql> SOURCE /docker-entrypoint-initdb.d/init.sql;
 | ||
| ```
 | ||
| 
 | ||
| #### 7.2.4 验证部署
 | ||
| 
 | ||
| ```bash
 | ||
| # 检查后端健康状态
 | ||
| curl http://localhost:8000/health
 | ||
| 
 | ||
| # 检查前端
 | ||
| curl http://localhost/
 | ||
| 
 | ||
| # 测试 API
 | ||
| curl http://localhost:8000/api/market_info
 | ||
| ```
 | ||
| 
 | ||
| ### 7.3 生产环境优化
 | ||
| 
 | ||
| #### 7.3.1 使用 Gunicorn 运行后端
 | ||
| 
 | ||
| ```bash
 | ||
| # 安装 gunicorn
 | ||
| pip install gunicorn
 | ||
| 
 | ||
| # 启动命令
 | ||
| gunicorn app.main:app \
 | ||
|   --workers 4 \
 | ||
|   --worker-class uvicorn.workers.UvicornWorker \
 | ||
|   --bind 0.0.0.0:8000 \
 | ||
|   --access-logfile logs/access.log \
 | ||
|   --error-logfile logs/error.log \
 | ||
|   --log-level info
 | ||
| ```
 | ||
| 
 | ||
| #### 7.3.2 MySQL 优化配置
 | ||
| 
 | ||
| ```ini
 | ||
| # my.cnf
 | ||
| [mysqld]
 | ||
| # 基础配置
 | ||
| max_connections = 500
 | ||
| max_allowed_packet = 64M
 | ||
| 
 | ||
| # InnoDB 配置
 | ||
| innodb_buffer_pool_size = 2G
 | ||
| innodb_log_file_size = 256M
 | ||
| innodb_flush_log_at_trx_commit = 2
 | ||
| innodb_flush_method = O_DIRECT
 | ||
| 
 | ||
| # 查询缓存
 | ||
| query_cache_type = 1
 | ||
| query_cache_size = 128M
 | ||
| 
 | ||
| # 慢查询日志
 | ||
| slow_query_log = 1
 | ||
| slow_query_log_file = /var/log/mysql/slow.log
 | ||
| long_query_time = 2
 | ||
| ```
 | ||
| 
 | ||
| #### 7.3.3 Nginx 生产配置
 | ||
| 
 | ||
| ```nginx
 | ||
| # /etc/nginx/sites-available/huawei-market
 | ||
| server {
 | ||
|     listen 80;
 | ||
|     server_name your-domain.com;
 | ||
|     
 | ||
|     # 重定向到 HTTPS
 | ||
|     return 301 https://$server_name$request_uri;
 | ||
| }
 | ||
| 
 | ||
| server {
 | ||
|     listen 443 ssl http2;
 | ||
|     server_name your-domain.com;
 | ||
|     
 | ||
|     # SSL 证书
 | ||
|     ssl_certificate /etc/nginx/ssl/cert.pem;
 | ||
|     ssl_certificate_key /etc/nginx/ssl/key.pem;
 | ||
|     
 | ||
|     # SSL 配置
 | ||
|     ssl_protocols TLSv1.2 TLSv1.3;
 | ||
|     ssl_ciphers HIGH:!aNULL:!MD5;
 | ||
|     ssl_prefer_server_ciphers on;
 | ||
|     
 | ||
|     # 安全头
 | ||
|     add_header X-Frame-Options "SAMEORIGIN" always;
 | ||
|     add_header X-Content-Type-Options "nosniff" always;
 | ||
|     add_header X-XSS-Protection "1; mode=block" always;
 | ||
|     
 | ||
|     # 日志
 | ||
|     access_log /var/log/nginx/huawei-market-access.log;
 | ||
|     error_log /var/log/nginx/huawei-market-error.log;
 | ||
|     
 | ||
|     # 前端
 | ||
|     location / {
 | ||
|         root /var/www/huawei-market/frontend;
 | ||
|         try_files $uri $uri/ /index.html;
 | ||
|     }
 | ||
|     
 | ||
|     # API
 | ||
|     location /api {
 | ||
|         proxy_pass http://127.0.0.1:8000;
 | ||
|         proxy_http_version 1.1;
 | ||
|         proxy_set_header Upgrade $http_upgrade;
 | ||
|         proxy_set_header Connection 'upgrade';
 | ||
|         proxy_set_header Host $host;
 | ||
|         proxy_cache_bypass $http_upgrade;
 | ||
|         proxy_set_header X-Real-IP $remote_addr;
 | ||
|         proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
 | ||
|         proxy_set_header X-Forwarded-Proto $scheme;
 | ||
|         
 | ||
|         # 超时设置
 | ||
|         proxy_connect_timeout 60s;
 | ||
|         proxy_send_timeout 60s;
 | ||
|         proxy_read_timeout 60s;
 | ||
|     }
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| ### 7.4 监控与维护
 | ||
| 
 | ||
| #### 7.4.1 日志管理
 | ||
| 
 | ||
| ```python
 | ||
| # app/utils/logger.py
 | ||
| import logging
 | ||
| from logging.handlers import RotatingFileHandler
 | ||
| import os
 | ||
| 
 | ||
| def setup_logger(name: str, log_file: str, level=logging.INFO):
 | ||
|     """配置日志"""
 | ||
|     formatter = logging.Formatter(
 | ||
|         '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
 | ||
|     )
 | ||
|     
 | ||
|     # 确保日志目录存在
 | ||
|     os.makedirs(os.path.dirname(log_file), exist_ok=True)
 | ||
|     
 | ||
|     # 文件处理器(自动轮转)
 | ||
|     file_handler = RotatingFileHandler(
 | ||
|         log_file,
 | ||
|         maxBytes=10*1024*1024,  # 10MB
 | ||
|         backupCount=5
 | ||
|     )
 | ||
|     file_handler.setFormatter(formatter)
 | ||
|     
 | ||
|     # 控制台处理器
 | ||
|     console_handler = logging.StreamHandler()
 | ||
|     console_handler.setFormatter(formatter)
 | ||
|     
 | ||
|     logger = logging.getLogger(name)
 | ||
|     logger.setLevel(level)
 | ||
|     logger.addHandler(file_handler)
 | ||
|     logger.addHandler(console_handler)
 | ||
|     
 | ||
|     return logger
 | ||
| ```
 | ||
| 
 | ||
| #### 7.4.2 健康检查
 | ||
| 
 | ||
| ```python
 | ||
| # app/api/health.py
 | ||
| from fastapi import APIRouter, Depends
 | ||
| from sqlalchemy.ext.asyncio import AsyncSession
 | ||
| from sqlalchemy import text
 | ||
| from app.database import get_db
 | ||
| 
 | ||
| router = APIRouter(tags=["健康检查"])
 | ||
| 
 | ||
| @router.get("/health")
 | ||
| async def health_check(db: AsyncSession = Depends(get_db)):
 | ||
|     """健康检查"""
 | ||
|     try:
 | ||
|         # 检查数据库连接
 | ||
|         await db.execute(text("SELECT 1"))
 | ||
|         
 | ||
|         return {
 | ||
|             "status": "healthy",
 | ||
|             "database": "connected",
 | ||
|             "timestamp": datetime.now().isoformat()
 | ||
|         }
 | ||
|     except Exception as e:
 | ||
|         return {
 | ||
|             "status": "unhealthy",
 | ||
|             "database": "disconnected",
 | ||
|             "error": str(e),
 | ||
|             "timestamp": datetime.now().isoformat()
 | ||
|         }
 | ||
| ```
 | ||
| 
 | ||
| #### 7.4.3 性能监控
 | ||
| 
 | ||
| ```bash
 | ||
| # 使用 Prometheus + Grafana 监控
 | ||
| 
 | ||
| # 1. 安装 prometheus-fastapi-instrumentator
 | ||
| pip install prometheus-fastapi-instrumentator
 | ||
| 
 | ||
| # 2. 在 main.py 中添加
 | ||
| from prometheus_fastapi_instrumentator import Instrumentator
 | ||
| 
 | ||
| app = FastAPI()
 | ||
| Instrumentator().instrument(app).expose(app)
 | ||
| ```
 | ||
| 
 | ||
| ### 7.5 备份策略
 | ||
| 
 | ||
| ```bash
 | ||
| #!/bin/bash
 | ||
| # backup.sh - 数据库备份脚本
 | ||
| 
 | ||
| BACKUP_DIR="/backup/mysql"
 | ||
| DATE=$(date +%Y%m%d_%H%M%S)
 | ||
| MYSQL_USER="root"
 | ||
| MYSQL_PASSWORD="your_password"
 | ||
| DATABASE="huawei_market"
 | ||
| 
 | ||
| # 创建备份目录
 | ||
| mkdir -p $BACKUP_DIR
 | ||
| 
 | ||
| # 备份数据库
 | ||
| mysqldump -u$MYSQL_USER -p$MYSQL_PASSWORD \
 | ||
|   --single-transaction \
 | ||
|   --routines \
 | ||
|   --triggers \
 | ||
|   $DATABASE > $BACKUP_DIR/backup_$DATE.sql
 | ||
| 
 | ||
| # 压缩备份文件
 | ||
| gzip $BACKUP_DIR/backup_$DATE.sql
 | ||
| 
 | ||
| # 删除7天前的备份
 | ||
| find $BACKUP_DIR -name "backup_*.sql.gz" -mtime +7 -delete
 | ||
| 
 | ||
| echo "备份完成: backup_$DATE.sql.gz"
 | ||
| ```
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 8. 开发建议与最佳实践
 | ||
| 
 | ||
| ### 8.1 代码规范
 | ||
| 
 | ||
| - **Python**: 遵循 PEP 8 规范,使用 Black 格式化
 | ||
| - **TypeScript**: 使用 ESLint + Prettier
 | ||
| - **提交信息**: 遵循 Conventional Commits 规范
 | ||
| 
 | ||
| ### 8.2 测试策略
 | ||
| 
 | ||
| ```python
 | ||
| # tests/test_crawler.py
 | ||
| import pytest
 | ||
| from app.crawler.huawei_api import HuaweiAPI
 | ||
| 
 | ||
| @pytest.mark.asyncio
 | ||
| async def test_get_app_info():
 | ||
|     api = HuaweiAPI()
 | ||
|     data = await api.get_app_info(pkg_name="com.huawei.hmsapp.appgallery")
 | ||
|     
 | ||
|     assert data['pkgName'] == "com.huawei.hmsapp.appgallery"
 | ||
|     assert 'name' in data
 | ||
|     assert 'appId' in data
 | ||
|     
 | ||
|     await api.close()
 | ||
| ```
 | ||
| 
 | ||
| ### 8.3 性能优化
 | ||
| 
 | ||
| 1. **数据库查询优化**
 | ||
|    - 使用索引
 | ||
|    - 避免 N+1 查询
 | ||
|    - 使用连接池
 | ||
| 
 | ||
| 2. **缓存策略**
 | ||
|    - Redis 缓存热门数据
 | ||
|    - 前端使用 LocalStorage
 | ||
| 
 | ||
| 3. **异步处理**
 | ||
|    - 使用异步 I/O
 | ||
|    - 批量处理数据
 | ||
| 
 | ||
| ### 8.4 安全建议
 | ||
| 
 | ||
| 1. **API 安全**
 | ||
|    - 添加 API 限流
 | ||
|    - 使用 JWT 认证(如需要)
 | ||
|    - 输入验证和清洗
 | ||
| 
 | ||
| 2. **数据库安全**
 | ||
|    - 使用参数化查询
 | ||
|    - 最小权限原则
 | ||
|    - 定期备份
 | ||
| 
 | ||
| 3. **爬虫礼仪**
 | ||
|    - 遵守 robots.txt
 | ||
|    - 控制请求频率
 | ||
|    - 使用合理的 User-Agent
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 9. 常见问题 FAQ
 | ||
| 
 | ||
| ### Q1: Token 获取失败怎么办?
 | ||
| 
 | ||
| **A:** 
 | ||
| 1. 检查网络连接
 | ||
| 2. 确认 Playwright 浏览器已安装
 | ||
| 3. 尝试手动访问华为应用市场,检查是否需要验证码
 | ||
| 4. 增加等待时间
 | ||
| 
 | ||
| ### Q2: 数据库连接超时?
 | ||
| 
 | ||
| **A:**
 | ||
| 1. 检查 MySQL 服务是否运行
 | ||
| 2. 验证连接配置是否正确
 | ||
| 3. 增加连接池大小
 | ||
| 4. 检查防火墙设置
 | ||
| 
 | ||
| ### Q3: 爬取速度太慢?
 | ||
| 
 | ||
| **A:**
 | ||
| 1. 增加并发数量
 | ||
| 2. 使用批量处理
 | ||
| 3. 优化数据库写入
 | ||
| 4. 考虑使用多台服务器分布式爬取
 | ||
| 
 | ||
| ### Q4: 如何处理反爬虫?
 | ||
| 
 | ||
| **A:**
 | ||
| 1. 降低请求频率
 | ||
| 2. 使用代理IP池
 | ||
| 3. 模拟真实浏览器行为
 | ||
| 4. 定期更新 Token
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 10. 参考资源
 | ||
| 
 | ||
| - **FastAPI 文档**: https://fastapi.tiangolo.com/
 | ||
| - **Vue 3 文档**: https://vuejs.org/
 | ||
| - **SQLAlchemy 文档**: https://docs.sqlalchemy.org/
 | ||
| - **Playwright 文档**: https://playwright.dev/python/
 | ||
| - **MySQL 文档**: https://dev.mysql.com/doc/
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 附录B:完整项目清单
 | ||
| 
 | ||
| ### 后端文件清单
 | ||
| ```
 | ||
| backend/
 | ||
| ├── app/
 | ||
| │   ├── __init__.py
 | ||
| │   ├── main.py
 | ||
| │   ├── config.py
 | ||
| │   ├── database.py
 | ||
| │   ├── models/
 | ||
| │   ├── schemas/
 | ||
| │   ├── api/
 | ||
| │   ├── crawler/
 | ||
| │   ├── scheduler/
 | ||
| │   └── utils/
 | ||
| ├── tests/
 | ||
| ├── logs/
 | ||
| ├── requirements.txt
 | ||
| ├── .env
 | ||
| ├── Dockerfile
 | ||
| └── README.md
 | ||
| ```
 | ||
| 
 | ||
| ### 前端文件清单
 | ||
| ```
 | ||
| frontend/
 | ||
| ├── public/
 | ||
| ├── src/
 | ||
| │   ├── assets/
 | ||
| │   ├── components/
 | ||
| │   ├── views/
 | ||
| │   ├── api/
 | ||
| │   ├── stores/
 | ||
| │   ├── types/
 | ||
| │   ├── utils/
 | ||
| │   ├── router/
 | ||
| │   ├── App.vue
 | ||
| │   └── main.ts
 | ||
| ├── package.json
 | ||
| ├── vite.config.ts
 | ||
| ├── tsconfig.json
 | ||
| ├── Dockerfile
 | ||
| ├── nginx.conf
 | ||
| └── README.md
 | ||
| ```
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| **文档版本**: v1.0  
 | ||
| **最后更新**: 2024年  
 | ||
| **维护者**: [Your Name]  
 | ||
| **许可证**: MIT
 | ||
| 
 | ||
| 
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| ## 附录C:原项目中的包名获取策略
 | ||
| 
 | ||
| 原 Rust 项目使用了多种创新的方法来发现和获取应用包名,这些方法非常值得借鉴。
 | ||
| 
 | ||
| ### C.1 核心策略概览
 | ||
| 
 | ||
| 原项目提供了 **7 个独立工具** 用于获取包名和应用数据:
 | ||
| 
 | ||
| | 工具名 | 用途 | 策略 |
 | ||
| |--------|------|------|
 | ||
| | `guess_market` | 应用ID猜测 | 遍历指定范围的应用ID |
 | ||
| | `guess_rand` | 随机猜测 | 随机生成应用ID进行探测 |
 | ||
| | `guess_from_db` | 数据库扩展 | 基于已有数据推测相邻ID |
 | ||
| | `guess_large` | 大规模猜测 | 大范围ID扫描 |
 | ||
| | `get_nextmax` | 第三方数据源 | 从 nextmax.cn 获取 |
 | ||
| | `read_appgallery` | 应用市场爬取 | 直接爬取华为应用市场页面 |
 | ||
| | `read_pkg_name` | 批量导入 | 从文件读取包名列表 |
 | ||
| 
 | ||
| ### C.2 方法详解
 | ||
| 
 | ||
| #### C.2.1 应用ID猜测法 (guess_market)
 | ||
| 
 | ||
| **原理:** 华为应用的 app_id 格式为固定前缀 + 数字,通过遍历数字范围来发现应用。
 | ||
| 
 | ||
| **app_id 格式:**
 | ||
| ```
 | ||
| C576588020785 + 7位数字
 | ||
| 例如: C5765880207856366961
 | ||
| ```
 | ||
| 
 | ||
| **核心代码逻辑:**
 | ||
| ```rust
 | ||
| // 定义扫描范围
 | ||
| let range = 2000000..=6390000;
 | ||
| let start = "C576588020785";
 | ||
| 
 | ||
| // 批量处理(每批1000个)
 | ||
| for bunch_id in range_vec.chunks(1000) {
 | ||
|     let mut join_set = tokio::task::JoinSet::new();
 | ||
|     
 | ||
|     for id in bunch_id.iter() {
 | ||
|         let app_id = format!("{start}{id:07}");  // 格式化为7位数字
 | ||
|         
 | ||
|         // 异步请求华为API
 | ||
|         join_set.spawn(async move {
 | ||
|             if let Ok(data) = query_app(&client, &api_url, &AppQuery::app_id(&app_id), &locale).await {
 | ||
|                 // 保存到数据库
 | ||
|                 db.save_app_data(&data.0, data.1.as_ref(), None, Some(comment)).await
 | ||
|             }
 | ||
|         });
 | ||
|     }
 | ||
|     
 | ||
|     join_set.join_all().await;
 | ||
|     tokio::time::sleep(Duration::from_millis(25)).await;  // 批次间延迟
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| **Python 实现示例:**
 | ||
| ```python
 | ||
| import asyncio
 | ||
| from typing import List
 | ||
| 
 | ||
| async def guess_market_apps(
 | ||
|     start_prefix: str = "C576588020785",
 | ||
|     start_range: int = 2000000,
 | ||
|     end_range: int = 6390000,
 | ||
|     batch_size: int = 1000
 | ||
| ):
 | ||
|     """通过ID猜测发现应用"""
 | ||
|     api = HuaweiAPI()
 | ||
|     db = Database()
 | ||
|     
 | ||
|     for batch_start in range(start_range, end_range, batch_size):
 | ||
|         batch_end = min(batch_start + batch_size, end_range)
 | ||
|         tasks = []
 | ||
|         
 | ||
|         for i in range(batch_start, batch_end):
 | ||
|             app_id = f"{start_prefix}{i:07d}"  # 7位数字,不足补0
 | ||
|             tasks.append(try_fetch_app(api, db, app_id))
 | ||
|         
 | ||
|         # 并发执行
 | ||
|         results = await asyncio.gather(*tasks, return_exceptions=True)
 | ||
|         
 | ||
|         # 统计结果
 | ||
|         success_count = sum(1 for r in results if not isinstance(r, Exception))
 | ||
|         print(f"批次 {batch_start}-{batch_end}: 成功 {success_count}/{len(tasks)}")
 | ||
|         
 | ||
|         # 批次间延迟
 | ||
|         await asyncio.sleep(0.025)
 | ||
| 
 | ||
| async def try_fetch_app(api: HuaweiAPI, db: Database, app_id: str):
 | ||
|     """尝试获取单个应用"""
 | ||
|     try:
 | ||
|         app_data = await api.get_app_info(app_id=app_id)
 | ||
|         rating_data = await api.get_app_rating(app_id)
 | ||
|         
 | ||
|         await db.save_app_data(app_data, rating_data, comment={
 | ||
|             "user": "guess_market",
 | ||
|             "method": "id_guessing"
 | ||
|         })
 | ||
|         
 | ||
|         print(f"✓ 发现应用: {app_data['name']} ({app_data['pkgName']})")
 | ||
|         return True
 | ||
|     except Exception as e:
 | ||
|         # 应用不存在或请求失败,静默跳过
 | ||
|         return False
 | ||
| ```
 | ||
| 
 | ||
| **已知的应用ID前缀:**
 | ||
| ```python
 | ||
| KNOWN_APP_ID_PREFIXES = [
 | ||
|     "C576588020785",  # 主要前缀
 | ||
|     "C69175",         # 另一个前缀系列
 | ||
|     # 可以通过分析已有数据发现更多前缀
 | ||
| ]
 | ||
| ```
 | ||
| 
 | ||
| #### C.2.2 随机猜测法 (guess_rand)
 | ||
| 
 | ||
| **原理:** 在已知的ID范围内随机生成ID,提高发现效率。
 | ||
| 
 | ||
| **适用场景:**
 | ||
| - ID空间很大,顺序遍历效率低
 | ||
| - 想要快速发现热门应用(通常ID较新)
 | ||
| 
 | ||
| **核心逻辑:**
 | ||
| ```rust
 | ||
| let code_start = 59067092904725_u64;
 | ||
| let size = 85170011059280_u64 - code_start;
 | ||
| let start = "C69175";
 | ||
| 
 | ||
| loop {
 | ||
|     let mut ids: Vec<u64> = Vec::with_capacity(1000);
 | ||
|     for _ in 0..1000 {
 | ||
|         let id = code_start + (rng.next() % size);  // 随机生成
 | ||
|         ids.push(id);
 | ||
|     }
 | ||
|     
 | ||
|     // 批量处理这些随机ID
 | ||
|     // ...
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| **Python 实现:**
 | ||
| ```python
 | ||
| import random
 | ||
| 
 | ||
| async def guess_random_apps(
 | ||
|     prefix: str = "C69175",
 | ||
|     start: int = 59067092904725,
 | ||
|     end: int = 85170011059280,
 | ||
|     batch_size: int = 1000
 | ||
| ):
 | ||
|     """随机猜测应用ID"""
 | ||
|     api = HuaweiAPI()
 | ||
|     db = Database()
 | ||
|     
 | ||
|     while True:
 | ||
|         # 生成随机ID批次
 | ||
|         random_ids = [
 | ||
|             f"{prefix}{random.randint(start, end)}"
 | ||
|             for _ in range(batch_size)
 | ||
|         ]
 | ||
|         
 | ||
|         tasks = [try_fetch_app(api, db, app_id) for app_id in random_ids]
 | ||
|         results = await asyncio.gather(*tasks, return_exceptions=True)
 | ||
|         
 | ||
|         success_count = sum(1 for r in results if r is True)
 | ||
|         print(f"随机批次: 成功 {success_count}/{batch_size}")
 | ||
|         
 | ||
|         await asyncio.sleep(0.005)
 | ||
| ```
 | ||
| 
 | ||
| #### C.2.3 数据库扩展法 (guess_from_db)
 | ||
| 
 | ||
| **原理:** 基于已有的应用ID,推测其相邻的ID可能也是有效应用。
 | ||
| 
 | ||
| **策略:**
 | ||
| 1. 从数据库获取所有已知的 app_id
 | ||
| 2. 解析每个 app_id 的前缀和数字部分
 | ||
| 3. 对每个数字,生成 ±1000 的范围
 | ||
| 4. 合并重叠的范围
 | ||
| 5. 扫描这些范围
 | ||
| 
 | ||
| **核心逻辑:**
 | ||
| ```rust
 | ||
| // 1. 获取所有已知app_id
 | ||
| let existing_app_ids = db.get_all_app_ids().await?;
 | ||
| 
 | ||
| // 2. 为每个app_id生成扩展范围
 | ||
| for app_id in existing_app_ids {
 | ||
|     if let Some((prefix, numeric_part)) = parse_app_id(&app_id) {
 | ||
|         let start_range = numeric_part.saturating_sub(1000);
 | ||
|         let end_range = numeric_part.saturating_add(1000);
 | ||
|         all_ranges.insert((prefix, start_range, end_range));
 | ||
|     }
 | ||
| }
 | ||
| 
 | ||
| // 3. 合并重叠范围
 | ||
| // 例如: (100, 1100) 和 (500, 1500) 合并为 (100, 1500)
 | ||
| 
 | ||
| // 4. 扫描合并后的范围
 | ||
| for (prefix, start, end) in merged_ranges {
 | ||
|     for id in start..=end {
 | ||
|         let app_id = format!("{}{}", prefix, id);
 | ||
|         // 尝试获取应用
 | ||
|     }
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| **Python 实现:**
 | ||
| ```python
 | ||
| from typing import Tuple, Optional
 | ||
| import re
 | ||
| 
 | ||
| def parse_app_id(app_id: str) -> Optional[Tuple[str, int]]:
 | ||
|     """解析app_id,返回(前缀, 数字)"""
 | ||
|     match = re.match(r'^([A-Z]+)(\d+)$', app_id)
 | ||
|     if match:
 | ||
|         return match.group(1), int(match.group(2))
 | ||
|     return None
 | ||
| 
 | ||
| async def guess_from_database(expand_range: int = 1000):
 | ||
|     """基于数据库已有数据扩展"""
 | ||
|     db = Database()
 | ||
|     
 | ||
|     # 1. 获取所有已知app_id
 | ||
|     existing_ids = await db.get_all_app_ids()
 | ||
|     
 | ||
|     # 2. 生成扩展范围
 | ||
|     ranges = {}
 | ||
|     for app_id in existing_ids:
 | ||
|         parsed = parse_app_id(app_id)
 | ||
|         if not parsed:
 | ||
|             continue
 | ||
|         
 | ||
|         prefix, num = parsed
 | ||
|         start = max(0, num - expand_range)
 | ||
|         end = num + expand_range
 | ||
|         
 | ||
|         if prefix not in ranges:
 | ||
|             ranges[prefix] = []
 | ||
|         ranges[prefix].append((start, end))
 | ||
|     
 | ||
|     # 3. 合并重叠范围
 | ||
|     merged_ranges = {}
 | ||
|     for prefix, range_list in ranges.items():
 | ||
|         range_list.sort()
 | ||
|         merged = []
 | ||
|         current = range_list[0]
 | ||
|         
 | ||
|         for r in range_list[1:]:
 | ||
|             if r[0] <= current[1] + 1:
 | ||
|                 # 重叠或相邻,合并
 | ||
|                 current = (current[0], max(current[1], r[1]))
 | ||
|             else:
 | ||
|                 merged.append(current)
 | ||
|                 current = r
 | ||
|         merged.append(current)
 | ||
|         merged_ranges[prefix] = merged
 | ||
|     
 | ||
|     # 4. 扫描范围
 | ||
|     api = HuaweiAPI()
 | ||
|     for prefix, range_list in merged_ranges.items():
 | ||
|         for start, end in range_list:
 | ||
|             print(f"扫描范围: {prefix}{start} - {prefix}{end}")
 | ||
|             await guess_market_apps(prefix, start, end)
 | ||
| ```
 | ||
| 
 | ||
| #### C.2.4 从文件批量导入 (read_pkg_name)
 | ||
| 
 | ||
| **原理:** 从文本文件读取包名列表,批量获取应用数据。
 | ||
| 
 | ||
| **使用方式:**
 | ||
| ```bash
 | ||
| # 创建包名列表文件
 | ||
| cat > pkg_names.txt << EOF
 | ||
| com.huawei.hmsapp.appgallery
 | ||
| com.tencent.mm
 | ||
| com.sina.weibo
 | ||
| EOF
 | ||
| 
 | ||
| # 运行工具
 | ||
| cargo run --bin read_pkg_name pkg_names.txt
 | ||
| ```
 | ||
| 
 | ||
| **核心代码:**
 | ||
| ```rust
 | ||
| // 从命令行参数获取文件路径
 | ||
| let cli_file = std::env::args().nth(1).ok_or_else(|| anyhow::anyhow!("No file path provided"))?;
 | ||
| 
 | ||
| // 读取文件中的包名
 | ||
| let pkg_names: Vec<String> = {
 | ||
|     let file = std::fs::File::open(&cli_file)?;
 | ||
|     let mut reader = std::io::BufReader::new(file);
 | ||
|     let mut pkg_names = Vec::new();
 | ||
|     let mut line = String::new();
 | ||
|     while reader.read_line(&mut line)? > 0 {
 | ||
|         pkg_names.push(line.trim().to_string());
 | ||
|         line.clear();
 | ||
|     }
 | ||
|     pkg_names.into_iter()
 | ||
|         .map(|l| l.trim_matches('\"').to_string())
 | ||
|         .collect()
 | ||
| };
 | ||
| 
 | ||
| // 批量同步
 | ||
| sync::sync_all(&client, &db, &config).await?;
 | ||
| ```
 | ||
| 
 | ||
| **Python 实现:**
 | ||
| ```python
 | ||
| async def read_pkg_names_from_file(filepath: str):
 | ||
|     """从文件读取包名并批量获取"""
 | ||
|     # 读取包名列表
 | ||
|     with open(filepath, 'r', encoding='utf-8') as f:
 | ||
|         pkg_names = [
 | ||
|             line.strip().strip('"').strip("'")
 | ||
|             for line in f
 | ||
|             if line.strip()
 | ||
|         ]
 | ||
|     
 | ||
|     print(f"从文件读取到 {len(pkg_names)} 个包名")
 | ||
|     
 | ||
|     # 批量获取
 | ||
|     api = HuaweiAPI()
 | ||
|     db = Database()
 | ||
|     
 | ||
|     for i in range(0, len(pkg_names), 100):
 | ||
|         batch = pkg_names[i:i+100]
 | ||
|         tasks = [
 | ||
|             fetch_and_save_app(api, db, pkg_name)
 | ||
|             for pkg_name in batch
 | ||
|         ]
 | ||
|         await asyncio.gather(*tasks, return_exceptions=True)
 | ||
|         print(f"已处理 {min(i+100, len(pkg_names))}/{len(pkg_names)}")
 | ||
| 
 | ||
| async def fetch_and_save_app(api: HuaweiAPI, db: Database, pkg_name: str):
 | ||
|     """获取并保存单个应用"""
 | ||
|     try:
 | ||
|         app_data = await api.get_app_info(pkg_name=pkg_name)
 | ||
|         rating_data = await api.get_app_rating(app_data['appId'])
 | ||
|         await db.save_app_data(app_data, rating_data)
 | ||
|         print(f"✓ {pkg_name}")
 | ||
|     except Exception as e:
 | ||
|         print(f"✗ {pkg_name}: {e}")
 | ||
| ```
 | ||
| 
 | ||
| #### C.2.5 Substance(主题/合集)批量获取
 | ||
| 
 | ||
| **原理:** 华为应用市场有"主题"或"合集"功能,一个 substance 包含多个应用。
 | ||
| 
 | ||
| **Substance ID 格式:**
 | ||
| ```
 | ||
| 例如: webAgSubstanceDetail|12345
 | ||
| ```
 | ||
| 
 | ||
| **核心逻辑:**
 | ||
| ```rust
 | ||
| pub async fn get_app_from_substance(
 | ||
|     client: &reqwest::Client,
 | ||
|     api_url: &str,
 | ||
|     substance_id: impl ToString,
 | ||
| ) -> Result<(SubstanceData, JsonValue)> {
 | ||
|     // 1. 请求 substance 详情
 | ||
|     let body = serde_json::json!({
 | ||
|         "pageId": format!("webAgSubstanceDetail|{}", substance_id.to_string()),
 | ||
|         "pageNum": 1,
 | ||
|         "pageSize": 100,
 | ||
|         "zone": "",
 | ||
|         "businessParam": { "animation": 0 }
 | ||
|     });
 | ||
|     
 | ||
|     let response = client.post(format!("{api_url}/harmony/page-detail"))
 | ||
|         .json(&body)
 | ||
|         .send()
 | ||
|         .await?;
 | ||
|     
 | ||
|     let data = response.json::<JsonValue>().await?;
 | ||
|     
 | ||
|     // 2. 解析卡片数据,提取应用ID
 | ||
|     let layouts = data["pages"][0]["data"]["cardlist"]["layoutData"].as_array()?;
 | ||
|     
 | ||
|     let mut apps = Vec::new();
 | ||
|     for card in layouts {
 | ||
|         match card["type"].as_str()? {
 | ||
|             "com.huawei.hmsapp.appgallery.verticallistcard" => {
 | ||
|                 // 竖向列表卡片
 | ||
|                 for app in card["data"].as_array()? {
 | ||
|                     if let Some(app_id) = app.get("appId") {
 | ||
|                         apps.push(AppQuery::app_id(app_id.as_str()?));
 | ||
|                     }
 | ||
|                 }
 | ||
|             }
 | ||
|             "com.huawei.hmos.appgallery.scenariolistcard.landing" => {
 | ||
|                 // 场景列表卡片
 | ||
|                 let refs_list = card["data"][0]["refsList_app"].as_array()?;
 | ||
|                 for app in refs_list {
 | ||
|                     if let Some(app_id) = app.get("appId") {
 | ||
|                         apps.push(AppQuery::app_id(app_id.as_str()?));
 | ||
|                     }
 | ||
|                 }
 | ||
|             }
 | ||
|             _ => {}
 | ||
|         }
 | ||
|     }
 | ||
|     
 | ||
|     // 3. 如果有更多页,继续获取
 | ||
|     if data["hasMore"].as_i64()? != 0 {
 | ||
|         let more_apps = get_more_substance(client, api_url, card_id).await?;
 | ||
|         apps.extend(more_apps);
 | ||
|     }
 | ||
|     
 | ||
|     Ok((SubstanceData { id, title, apps }, data))
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| **Python 实现:**
 | ||
| ```python
 | ||
| async def get_apps_from_substance(substance_id: str) -> List[str]:
 | ||
|     """从主题/合集获取应用列表"""
 | ||
|     api = HuaweiAPI()
 | ||
|     
 | ||
|     url = f"{api.base_url}/harmony/page-detail"
 | ||
|     body = {
 | ||
|         "pageId": f"webAgSubstanceDetail|{substance_id}",
 | ||
|         "pageNum": 1,
 | ||
|         "pageSize": 100,
 | ||
|         "zone": "",
 | ||
|         "businessParam": {"animation": 0}
 | ||
|     }
 | ||
|     
 | ||
|     tokens = await api.token_manager.get_token()
 | ||
|     headers = {
 | ||
|         "Content-Type": "application/json",
 | ||
|         "Interface-Code": tokens["interface_code"],
 | ||
|         "identity-id": tokens["identity_id"]
 | ||
|     }
 | ||
|     
 | ||
|     response = await api.client.post(url, json=body, headers=headers)
 | ||
|     data = response.json()
 | ||
|     
 | ||
|     app_ids = []
 | ||
|     layouts = data["pages"][0]["data"]["cardlist"]["layoutData"]
 | ||
|     
 | ||
|     for card in layouts:
 | ||
|         card_type = card.get("type", "")
 | ||
|         card_data = card.get("data", [])
 | ||
|         
 | ||
|         if card_type == "com.huawei.hmsapp.appgallery.verticallistcard":
 | ||
|             for app in card_data:
 | ||
|                 if "appId" in app:
 | ||
|                     app_ids.append(app["appId"])
 | ||
|         
 | ||
|         elif card_type == "com.huawei.hmos.appgallery.scenariolistcard.landing":
 | ||
|             if card_data and "refsList_app" in card_data[0]:
 | ||
|                 for app in card_data[0]["refsList_app"]:
 | ||
|                     if "appId" in app:
 | ||
|                         app_ids.append(app["appId"])
 | ||
|     
 | ||
|     # 处理分页
 | ||
|     if data.get("hasMore", 0) != 0:
 | ||
|         card_id = data["cardlist"]["dataId"]
 | ||
|         more_apps = await get_more_substance_pages(api, card_id)
 | ||
|         app_ids.extend(more_apps)
 | ||
|     
 | ||
|     return app_ids
 | ||
| 
 | ||
| async def get_more_substance_pages(api: HuaweiAPI, card_id: str) -> List[str]:
 | ||
|     """获取主题的更多页"""
 | ||
|     app_ids = []
 | ||
|     page_num = 2
 | ||
|     has_more = True
 | ||
|     
 | ||
|     while has_more:
 | ||
|         url = f"{api.base_url}/harmony/card-list"
 | ||
|         body = {
 | ||
|             "dataId": card_id,
 | ||
|             "locale": "zh",
 | ||
|             "pageNum": page_num,
 | ||
|             "pageSize": 25
 | ||
|         }
 | ||
|         
 | ||
|         response = await api.client.post(url, json=body)
 | ||
|         data = response.json()
 | ||
|         
 | ||
|         has_more = data.get("hasMore", 0) != 0
 | ||
|         page_num += 1
 | ||
|         
 | ||
|         for card in data.get("layoutData", []):
 | ||
|             if card.get("type") == "com.huawei.hmsapp.appgallery.verticallistcard":
 | ||
|                 for app in card.get("data", []):
 | ||
|                     if "appId" in app:
 | ||
|                         app_ids.append(app["appId"])
 | ||
|     
 | ||
|     return app_ids
 | ||
| ```
 | ||
| 
 | ||
| ### C.3 综合策略建议
 | ||
| 
 | ||
| **初始阶段(冷启动):**
 | ||
| 1. 使用 `guess_market` 扫描已知的ID范围
 | ||
| 2. 从华为应用市场首页爬取热门应用
 | ||
| 3. 手动收集一些知名应用的包名
 | ||
| 
 | ||
| **扩展阶段:**
 | ||
| 1. 使用 `guess_from_db` 基于已有数据扩展
 | ||
| 2. 使用 `guess_rand` 随机发现新应用
 | ||
| 3. 定期从 substance(主题合集)批量获取
 | ||
| 
 | ||
| **维护阶段:**
 | ||
| 1. 定期同步已知包名的数据更新
 | ||
| 2. 监控新应用ID的出现模式
 | ||
| 3. 从用户投稿获取新包名
 | ||
| 
 | ||
| **效率优化:**
 | ||
| ```python
 | ||
| # 组合策略示例
 | ||
| async def comprehensive_discovery():
 | ||
|     """综合发现策略"""
 | ||
|     
 | ||
|     # 1. 先从数据库扩展(成功率高)
 | ||
|     await guess_from_database(expand_range=500)
 | ||
|     
 | ||
|     # 2. 扫描热门ID段
 | ||
|     await guess_market_apps("C576588020785", 6000000, 6400000)
 | ||
|     
 | ||
|     # 3. 随机探测(发现新应用)
 | ||
|     asyncio.create_task(guess_random_apps())  # 后台运行
 | ||
|     
 | ||
|     # 4. 定期同步已知应用
 | ||
|     await sync_known_apps()
 | ||
| ```
 | ||
| 
 | ||
| ### C.4 注意事项
 | ||
| 
 | ||
| 1. **请求频率控制**
 | ||
|    - 批次间延迟:25-50ms
 | ||
|    - 单个请求超时:30秒
 | ||
|    - 并发数:建议不超过1000
 | ||
| 
 | ||
| 2. **错误处理**
 | ||
|    - 应用不存在:静默跳过
 | ||
|    - 网络错误:重试3次
 | ||
|    - Token过期:自动刷新
 | ||
| 
 | ||
| 3. **数据去重**
 | ||
|    - 使用 app_id 或 pkg_name 作为唯一标识
 | ||
|    - 插入前检查数据库是否已存在
 | ||
| 
 | ||
| 4. **性能监控**
 | ||
|    - 记录成功率(发现率)
 | ||
|    - 监控请求耗时
 | ||
|    - 统计每小时发现的新应用数
 | ||
| 
 | ||
| 这些方法的组合使用,使得原项目能够高效地发现和收集华为应用市场的应用数据。
 | ||
| 
 | 
