Feature/skx pytest#800
Conversation
合并上游Fix/refine deepresearch (modelscope#742)
- Add comprehensive documentation for role C (Integration Engineer & Verification Loop) - Create analysis, goals and task breakdown documents in skx-docs - Integrate orchestrator components from chao/docs branch - Update projects/code_scratch configurations - Modify .gitignore to include skx-docs in version control - Implement external integration modules for role C
- Implement CodeScratchCaller for blackbox calling of projects/code_scratch - Implement DeepResearchCaller for blackbox calling of projects/deep_research - Implement PromptInjector for meta-instruction construction - Implement TestRunner for automated testing and error feedback - Update CodeAdapter to use real implementations instead of mocks - Create test file for external integration modules
- Update devcontainer.json to use Python 3.10 as default - Sync devcontainer configuration with chao/docs branch
- Add user_operation_guide.md with complete workflow instructions - Remove git_config_guide.md as it's no longer needed in docs
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive development environment using DevContainers and a new orchestration layer for the agent. The setup is very thorough and includes detailed documentation, which is great.
My review focuses on improving the security, correctness, and maintainability of the new configuration and scripts. I've identified some critical security concerns in the Docker configuration, such as the use of seccomp:unconfined and host networking, which should be addressed. There are also a few inconsistencies and potential bugs in the setup scripts and configuration files. Additionally, I've provided suggestions for optimizing the Dockerfile and refactoring some Python code for better practices.
Overall, this is a significant contribution that will greatly improve the development workflow. Addressing the feedback will help ensure the environment is secure and robust.
| security_opt: | ||
| - seccomp:unconfined |
There was a problem hiding this comment.
Disabling seccomp with seccomp:unconfined significantly reduces container security by removing the default syscall restrictions. This, combined with --cap-add=SYS_PTRACE from devcontainer.json, gives the container extensive privileges on the host. This should be avoided unless absolutely necessary for debugging. If required, please document the reason clearly. Consider using a custom, more restrictive seccomp profile if possible.
| "--hostname", | ||
| "ms-agent-dev", | ||
| "--add-host=host.docker.internal:host-gateway", | ||
| "--network=host" |
There was a problem hiding this comment.
Using --network=host removes network isolation between the container and the host, which can be a security risk. The container gets access to the host's entire network stack. If this is for convenience (e.g., accessing services on localhost), consider using host.docker.internal or explicitly publishing ports instead. If host networking is required, please document the reason.
| environment: | ||
| - PYTHONPATH=/workspace | ||
| - PYTHONUNBUFFERED=1 | ||
| - PYTHON_VERSION=3.11 |
| git config --global credential.helper store | ||
|
|
||
| # 创建 GitHub 凭据文件 | ||
| mkdir -p ~/.git-credentials | ||
| echo "https://oauth2:${GITHUB_TOKEN}@github.com" > ~/.git-credentials | ||
| chmod 600 ~/.git-credentials |
There was a problem hiding this comment.
Using credential.helper store saves the GitHub token in plaintext in ~/.git-credentials. This is a security risk, even inside a container. Consider using a more secure credential helper like cache to store the token in memory for a limited time. If you need to automate authentication, look into using the GitHub CLI (gh) for authentication, which is more secure.
| echo "🚀 开始安装项目依赖..." | ||
|
|
||
| # 切换到workspace目录(项目代码在此) | ||
| cd /workspaces/seu-ms-agent |
There was a problem hiding this comment.
The path /workspaces/seu-ms-agent is hardcoded. The docker-compose.yml mounts the project to /workspace. This hardcoded path is likely incorrect and will cause the script to fail. It's better to use a relative path or an environment variable. Given the context, it should probably be /workspace.
| cd /workspaces/seu-ms-agent | |
| cd /workspace | |
| RUN sed -i 's|http://archive.ubuntu.com/ubuntu/|https://mirrors.aliyun.com/ubuntu/|g' /etc/apt/sources.list \ | ||
| && sed -i 's|http://security.ubuntu.com/ubuntu/|https://mirrors.aliyun.com/ubuntu/|g' /etc/apt/sources.list \ | ||
| && printf 'Acquire::Retries "5";\nAcquire::http::Timeout "30";\n' > /etc/apt/apt.conf.d/80-retries | ||
|
|
||
| # 创建用户(避免使用root) | ||
| ARG USERNAME=vscode | ||
| ARG USER_UID=1000 | ||
| ARG USER_GID=$USER_UID | ||
|
|
||
| # 更新系统并安装基础软件包 | ||
| RUN apt-get update && apt-get install -y \ | ||
| # 基础工具 | ||
| sudo \ | ||
| curl \ | ||
| wget \ | ||
| git \ | ||
| vim \ | ||
| nano \ | ||
| unzip \ | ||
| zip \ | ||
| build-essential \ | ||
| cmake \ | ||
| pkg-config \ | ||
| # Python 3.10相关 | ||
| python3.10 \ | ||
| python3-pip \ | ||
| python3-venv \ | ||
| python3-dev \ | ||
| python3-distutils \ | ||
| # 网络工具 | ||
| openssh-client \ | ||
| # 语言环境 | ||
| locales \ | ||
| # 其他有用工具 | ||
| htop \ | ||
| tree \ | ||
| jq \ | ||
| # 配置语言环境并清理缓存 | ||
| && locale-gen en_US.UTF-8 \ | ||
| && update-locale LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 \ | ||
| && apt-get clean \ | ||
| && rm -rf /var/lib/apt/lists/* |
There was a problem hiding this comment.
| def _parse_pytest_output(self, stdout: str, | ||
| stderr: str) -> List[Dict[str, Any]]: | ||
| """ | ||
| 解析pytest输出,提取关键错误信息 | ||
|
|
||
| Args: | ||
| stdout: pytest的标准输出 | ||
| stderr: pytest的错误输出 | ||
|
|
||
| Returns: | ||
| 解析后的错误信息列表 | ||
| """ | ||
| errors = [] | ||
|
|
||
| # 解析输出以提取错误信息 | ||
| output = stdout + stderr | ||
|
|
||
| # 查找测试失败的相关信息 | ||
| lines = output.split('\n') | ||
| current_error = None | ||
|
|
||
| for line in lines: | ||
| # 检查是否是错误行 | ||
| if 'FAIL' in line and '::' in line: | ||
| # 这是一个失败的测试 | ||
| parts = line.split() | ||
| for part in parts: | ||
| if '::' in part and 'FAIL' not in part: | ||
| test_name = part | ||
| errors.append({ | ||
| 'type': 'test_failure', | ||
| 'test_name': test_name, | ||
| 'message': line.strip() | ||
| }) | ||
| break | ||
| elif 'ERROR' in line and '::' in line: | ||
| # 这是一个错误的测试 | ||
| parts = line.split() | ||
| for part in parts: | ||
| if '::' in part and 'ERROR' not in part: | ||
| test_name = part | ||
| errors.append({ | ||
| 'type': 'test_error', | ||
| 'test_name': test_name, | ||
| 'message': line.strip() | ||
| }) | ||
| break | ||
| elif line.strip().startswith('E '): | ||
| # 这是具体的错误详情 | ||
| if errors: | ||
| errors[-1]['detailed_error'] = line.strip()[ | ||
| 4:] # 移除 'E ' 前缀 | ||
|
|
||
| # 尝试解析Python异常 | ||
| in_traceback = False | ||
| for i, line in enumerate(lines): | ||
| if 'Traceback' in line and '(most recent call last)' in line: | ||
| in_traceback = True | ||
| continue | ||
|
|
||
| if in_traceback: | ||
| if line.strip().startswith('File "'): | ||
| # 提取异常信息 | ||
| error_info = {'type': 'exception', 'traceback': []} | ||
|
|
||
| # 收集整个traceback | ||
| j = i | ||
| while j < len(lines) and not (lines[j].strip() == '' | ||
| and j > i + 5): | ||
| if j >= len(lines): | ||
| break | ||
| error_info['traceback'].append(lines[j]) | ||
| if j > i and not lines[j].strip().startswith( | ||
| ' ') and not lines[j].strip().startswith( | ||
| 'File "'): | ||
| # 可能是异常类型和消息 | ||
| if ':' in lines[j]: | ||
| parts = lines[j].split(':', 1) | ||
| error_info['exception_type'] = parts[0].strip() | ||
| error_info['exception_message'] = parts[ | ||
| 1].strip() | ||
| break | ||
| j += 1 | ||
|
|
||
| errors.append(error_info) | ||
| in_traceback = False | ||
|
|
||
| return errors |
There was a problem hiding this comment.
Parsing pytest's stdout/stderr text is brittle and can easily break if pytest changes its output format. A more robust approach is to have pytest generate a machine-readable report, such as JUnit XML, using the --junit-xml=path/to/report.xml flag. You can then parse this XML file to reliably extract test results and failures.
There was a problem hiding this comment.
The logic to get or create an asyncio event loop is a bit complex. Since Python 3.7, asyncio.run() is the recommended, simpler way to run a top-level async function from sync code, as it handles loop creation and teardown automatically. Consider refactoring this to use asyncio.run() for simplicity and better practice.
# 5. 执行 Run (异步)
# 使用 asyncio.run() 来简化异步代码的执行
run_params = {
'user_prompt': query,
'breadth': 4, # 广度
'depth': 2, # 深度
'is_report': True,
'show_progress': True
}
asyncio.run(workflow.run(**run_params))
Change Summary
Related issue number
Checklist
pre-commit installandpre-commit run --all-filesbefore git commit, and passed lint check.