词元之母TOK.MOM - 平台充值汇率 1:1 即 1 人民币充值到账 1 美元,支持一个 Key 调用近 600+ 海内外模型,限时特价模型低至 1 折,欢迎上岸!
| 来源 | 内置(默认安装) |
| 路径 | skills/research/research-paper-writing |
| 版本 | 1.1.0 |
| 作者 | Orchestra Research |
| 许可证 | MIT |
| 依赖项 | semanticscholar, arxiv, habanero, requests, scipy, numpy, matplotlib, SciencePlots |
| 平台 | linux, macos |
| 标签 | Research, Paper Writing, Experiments, ML, AI, NeurIPS, ICML, ICLR, ACL, AAAI, COLM, LaTeX, Citations, Statistical Analysis |
| 相关 skill | arxiv, ml-paper-writing, subagent-driven-development, plan |
┌─────────────────────────────────────────────────────────────┐
│ RESEARCH PAPER PIPELINE │
│ │
│ Phase 0: Project Setup ──► Phase 1: Literature Review │
│ │ │ │
│ ▼ ▼ │
│ Phase 2: Experiment Phase 5: Paper Drafting ◄──┐ │
│ Design │ │ │
│ │ ▼ │ │
│ ▼ Phase 6: Self-Review │ │
│ Phase 3: Execution & & Revision ──────────┘ │
│ Monitoring │ │
│ │ ▼ │
│ ▼ Phase 7: Submission │
│ Phase 4: Analysis ─────► (feeds back to Phase 2 or 5) │
│ │
└─────────────────────────────────────────────────────────────┘[CITATION NEEDED]。| 置信度 | 行动 |
|---|---|
| 高(代码库清晰,贡献明确) | 写完整草稿,交付,根据反馈迭代 |
| 中(存在一定歧义) | 写草稿并标注不确定之处,继续推进 |
| 低(存在重大未知) | 通过 clarify 提 1-2 个针对性问题,然后起草 |
| 章节 | 是否自主起草? | 随草稿标注 |
|---|---|---|
| 摘要 | 是 | "将贡献框架为 X——如需调整请告知" |
| 引言 | 是 | "强调了问题 Y——如有误请纠正" |
| 方法 | 是 | "包含了细节 A、B、C——请补充遗漏部分" |
| 实验 | 是 | "突出了结果 1、2、3——如需重排请告知" |
| 相关工作 | 是 | "引用了论文 X、Y、Z——如有遗漏请补充" |
README.md — 项目概述与论点results/、outputs/、experiments/ — 现有发现configs/ — 实验配置.bib 文件 — 现有引用workspace/
paper/ # LaTeX 源文件、图表、编译后的 PDF
experiments/ # 实验运行脚本
code/ # 核心方法实现
results/ # 原始实验结果(自动生成)
tasks/ # 任务/基准定义
human_eval/ # 人工评估材料(如需要)Add Monte Carlo constrained results (5 runs, Sonnet 4.6, policy memo task)
Add Haiku baseline comparison: autoreason vs refinement baselines at cheap model tier向科学家提议:"根据我的理解,主要贡献是:[一句话]。关键结果显示 [Y]。这是您想要的框架吗?"
todo 工具创建结构化项目计划:Research Paper TODO:
- [ ] Define one-sentence contribution
- [ ] Literature review (related work + baselines)
- [ ] Design core experiments
- [ ] Run experiments
- [ ] Analyze results
- [ ] Write first draft
- [ ] Self-review (simulate reviewers)
- [ ] Revise based on review
- [ ] Submission prepCompute Budget Checklist:
- [ ] API costs: (model price per token) × (estimated tokens per run) × (number of runs)
- [ ] GPU hours: (time per experiment) × (number of experiments) × (number of seeds)
- [ ] Human evaluation costs: (annotators) × (hours) × (hourly rate)
- [ ] Total budget ceiling and contingency (add 30-50% for reruns)| 工作流 | 工具 | 适用场景 |
|---|---|---|
| Overleaf | 基于浏览器 | 多作者同时编辑,无 git 经验 |
| Git + LaTeX | git 配合 .gitignore 排除辅助文件 | 技术团队,需要基于分支的审阅 |
| Overleaf + Git 同步 | Overleaf 高级版 | 两全其美——实时协作加版本历史 |
Author Coordination Checklist:
- [ ] Agree on section ownership (who writes what)
- [ ] Set up shared workspace (Overleaf or git repo)
- [ ] Establish notation conventions (before anyone writes)
- [ ] Schedule internal review rounds (not just at the end)
- [ ] Designate one person for final formatting pass
- [ ] Agree on figure style (colors, fonts, sizes) before creating figures\method{} 宏,用于统一方法命名\citet{} 与 \citep{} 的使用规则arxiv skill 进行结构化论文发现:skill_view("arxiv")。它提供 arXiv REST API 搜索、Semantic Scholar 引用图谱、作者档案和 BibTeX 生成。web_search 进行广泛发现,使用 web_extract 获取特定论文:# 通过 web_search:
web_search("[main technique] + [application domain] site:arxiv.org")
web_search("[baseline method] comparison ICML NeurIPS 2024")
# 通过 web_extract(针对特定论文):
web_extract("https://arxiv.org/abs/2303.17651")Search queries:
- "[main technique] + [application domain]"
- "[baseline method] comparison"
- "[problem name] state-of-the-art"
- Author names from existing citationsIterative Literature Search:
Round 1 (Breadth): 4-6 parallel queries covering different angles
- "[method] + [domain]"
- "[problem name] state-of-the-art 2024 2025"
- "[baseline method] comparison"
- "[alternative approach] vs [your approach]"
→ Collect papers, extract key concepts and terminology
Round 2 (Depth): Generate follow-up queries from Round 1 learnings
- New terminology discovered in Round 1 papers
- Papers cited by the most relevant Round 1 results
- Contradictory findings that need investigation
→ Collect papers, identify remaining gaps
Round 3 (Targeted): Fill specific gaps
- Missing baselines identified in Rounds 1-2
- Concurrent work (last 6 months, same problem)
- Key negative results or failed approaches
→ Stop when new queries return mostly papers you've already seendelegate_task 并行委派每轮查询。收集结果,去重,然后从综合所得中生成下一轮查询。Citation Verification (MANDATORY per citation):
1. SEARCH → Query Semantic Scholar or Exa MCP with specific keywords
2. VERIFY → Confirm paper exists in 2+ sources (Semantic Scholar + arXiv/CrossRef)
3. RETRIEVE → Get BibTeX via DOI content negotiation (programmatically, not from memory)
4. VALIDATE → Confirm the claim you're citing actually appears in the paper
5. ADD → Add verified BibTeX to bibliography
If ANY step fails → mark as [CITATION NEEDED], inform scientistCitationManager 类请参见 references/citation-workflow.md。| 论点 | 实验 | 预期证据 |
|---|---|---|
| "我们的方法优于基线" | 主要对比(表 1) | 胜率、统计显著性 |
| "效果在较弱模型上更显著" | 模型规模研究 | 单调递增曲线 |
| "收敛需要范围约束" | 有约束 vs 无约束 | 收敛速率对比 |
results/<experiment>/
<task>/
<strategy>/
final_output.md # Final result
history.json # Full trajectory
pass_01/ # Per-iteration artifacts
version_a.md
version_b.md
critic.mdrun_experiment.py # Core experiment runner
run_baselines.py # Baseline comparison
run_comparison_judge.py # Blind evaluation
analyze_results.py # Statistical analysis
make_charts.py # Visualization| 决策 | 选项 | 指导 |
|---|---|---|
| 标注员类型 | 专家、众包工人、终端用户 | 与你的论点要求相匹配 |
| 量表 | Likert(1-5)、成对比较、排序 | 对 LLM 输出而言,成对比较比 Likert 更可靠 |
| 样本量 | 每位标注员及总条目数 | 功效分析,或最少 100 条、3+ 位标注员 |
| 一致性指标 | Cohen's kappa、Krippendorff's alpha、ICC | 2 位以上标注员用 Krippendorff's alpha;同时报告原始一致率 |
| 平台 | Prolific、MTurk、内部团队 | Prolific 质量好;MTurk 规模大;内部团队适合领域专业知识 |
- [ ] Clear task description with examples (good AND bad)
- [ ] Decision criteria for ambiguous cases
- [ ] At least 2 worked examples per category
- [ ] Attention checks / gold standard items (10-15% of total)
- [ ] Qualification task or screening round
- [ ] Estimated time per item and fair compensation (>= local minimum wage)
- [ ] IRB/ethics review if required by your institutionnohup:Monitor Prompt Template:
1. Check if process is still running: ps aux | grep <pattern>
2. Read last 30 lines of log: tail -30 <logfile>
3. Check for completed results: ls <result_dir>
4. If results exist, read and report: cat <result_file>
5. If all done, commit: git add -A && git commit -m "<descriptive message>" && git push
6. Report in structured format (tables with key metrics)
7. Answer the key analytical question for this experiment[SILENT] 以抑制对用户的通知。仅在有新情况时报告。| 故障 | 检测 | 恢复 |
|---|---|---|
| API 速率限制/额度耗尽 | 日志中出现 402/429 错误 | 等待后重新运行(脚本会跳过已完成的工作) |
| 进程崩溃 | PID 消失,结果不完整 | 从最后一个检查点重新运行 |
| 难题超时 | 进程卡住,日志无进展 | 终止并跳过,在结果中记录 |
| 模型 ID 错误 | 日志中出现引用模型名称的错误 | 修正 ID 后重新运行 |
// experiment_journal.jsonl — append one entry per experiment attempt
{
"id": "exp_003",
"parent": "exp_001",
"timestamp": "2025-05-10T14:30:00Z",
"hypothesis": "Adding scope constraints will fix convergence failure from exp_001",
"plan": "Re-run autoreason with max_tokens=2000 and fixed structure template",
"config": {"model": "haiku", "strategy": "autoreason", "max_tokens": 2000},
"status": "completed",
"result_path": "results/exp_003/",
"key_metrics": {"win_rate": 0.85, "convergence_rounds": 3},
"analysis": "Scope constraints fixed convergence. Win rate jumped from 0.42 to 0.85.",
"next_steps": ["Try same constraints on Sonnet", "Test without structure template"],
"figures": ["figures/exp003_convergence.pdf"]
}| 情况 | 行动 | 适合的会议 |
|---|---|---|
| 假设错误,但原因有信息量 | 围绕原因分析来框架论文 | NeurIPS、ICML(如果分析严谨) |
| 方法未超越基线,但揭示了新东西 | 将贡献重新框架为理解/分析 | ICLR(重视理解)、研讨会论文 |
| 对流行论断的干净负面结果 | 写出来——该领域需要知道 | NeurIPS Datasets & Benchmarks、TMLR、研讨会 |
| 结果不确定,没有清晰故事 | 转向——运行不同实验或重新框架 | 不要强行写一篇不成立的论文 |
plt.savefig('fig.pdf')booktabs LaTeX 包| 情况 | 行动 |
|---|---|
| 核心论点已支撑,结果显著 | 进入阶段 5(写作) |
| 结果不确定,需要更多数据 | 返回阶段 2(设计) |
| 意外发现提示新方向 | 返回阶段 2(设计) |
| 缺少审稿人会问的某个消融实验 | 运行它,然后进入阶段 5 |
| 所有实验完成但部分失败 | 记录失败,进入阶段 5 |
experiment_log.md,结构如下:experiment_log.md 和 LaTeX 模板,生成基于实际结果的初稿。没有这座桥梁,写作 agent 必须解析原始 JSON/CSV 文件并推断故事——这是捏造或误报数字的常见根源。| 你的情况 | 策略 | 原因 |
|---|---|---|
| 中等模型 + 受约束任务 | Autoreason | 最佳甜蜜点。生成-评估差距最大。基线会主动破坏弱模型输出。 |
| 中等模型 + 开放任务 | 添加范围约束的 Autoreason | 添加固定事实、结构或可交付物来限定改进空间。 |
| 前沿模型 + 受约束任务 | Autoreason | 即使在前沿模型上,2/3 受约束任务也能获胜。 |
| 前沿模型 + 无约束任务 | 批评-修改 或 单次通过 | Autoreason 排最后。模型自我评估已足够好。 |
| 具体技术任务(系统设计) | 批评-修改 | 直接的查找-修复循环更高效。 |
| 模板填充任务(只有一种正确结构) | 单次通过 或 保守策略 | 决策空间极小。迭代无附加价值。 |
| 带测试用例的代码 | Autoreason(代码变体) | 在修复前对失败原因进行结构化分析。恢复率 62% vs 43%。 |
| 极弱模型(Llama 8B 级别) | 单次通过 | 模型太弱,无法生成多样候选。投资于生成质量。 |
Model Tier │ Generation │ Self-Eval │ Gap │ Autoreason Value
──────────────────┼────────────┼───────────┼────────┼─────────────────
Weak (Llama 8B) │ Poor │ Poor │ Small │ None — can't generate diverse candidates
Mid (Haiku 3.5) │ Decent │ Poor │ LARGE │ MAXIMUM — 42/42 perfect Borda
Mid (Gemini Flash)│ Decent │ Moderate │ Large │ High — wins 2/3
Strong (Sonnet 4) │ Good │ Decent │ Medium │ Moderate — wins 3/5
Frontier (S4.6) │ Excellent │ Good │ Small │ Only with constraints| 失败 | 检测 | 修复 |
|---|---|---|
| 不收敛(A 从不获胜) | 20+ 次迭代中 A 获胜率 <15% | 为任务添加范围约束 |
| 综合漂移 | 字数无限增长 | 约束结构和可交付物 |
| 退化至单次通过以下 | 基线得分高于迭代输出 | 切换到单次通过;模型可能太弱 |
| 过拟合(代码) | 公开测试通过率高,私有测试通过率低 | 使用结构化分析,而非仅依赖测试反馈 |
| 评判者损坏 | 解析失败导致小组人数低于 3 | 先修复解析器再继续 |
| 起草任务 | 加载到上下文 | 不要加载 |
|---|---|---|
| 撰写引言 | experiment_log.md、贡献陈述、5-10 篇最相关论文的摘要 | 原始结果 JSON、完整实验脚本、所有文献笔记 |
| 撰写方法 | 实验配置、伪代码、架构描述 | 原始日志、其他实验的结果 |
| 撰写结果 | experiment_log.md、结果汇总表、图表列表 | 完整分析脚本、中间数据 |
| 撰写相关工作 | 整理好的引用笔记(步骤 1.4 的输出)、.bib 文件 | 实验文件、原始 PDF |
| 修改 | 完整论文草稿、具体审稿人意见 | 其他所有内容 |
experiment_log.md 是主要的上下文桥梁——它汇总了写作所需的一切,无需加载原始数据文件(参见步骤 4.6)context/ 目录,存放预压缩的摘要:context/
contribution.md # 1 sentence
experiment_summary.md # Key results table (from experiment_log.md)
literature_map.md # Organized citation notes
figure_inventory.md # List of figures with descriptions| 支柱 | 描述 | 检验 |
|---|---|---|
| 是什么 | 1-3 个具体的新颖论点 | 能用一句话陈述吗? |
| 为什么 | 严谨的实证证据 | 实验能将你的假设与其他假设区分开吗? |
| 意义何在 | 读者为何应该关注 | 这与社区认可的问题相关联吗? |
ml-paper-writing skill 编写。| 来源 | 主要贡献 | 链接 |
|---|---|---|
| Neel Nanda(Google DeepMind) | 叙事原则、是什么/为什么/意义何在框架 | How to Write ML Papers |
| Sebastian Farquhar(DeepMind) | 5 句摘要公式 | How to Write ML Papers |
| Gopen & Swan | 读者期望的 7 条原则 | Science of Scientific Writing |
| Zachary Lipton | 词语选择,消除模糊表达 | Heuristics for Scientific Writing |
| Jacob Steinhardt(UC Berkeley) | 精确性,术语一致性 | Writing Tips |
| Ethan Perez(Anthropic) | 微观层面的清晰度技巧 | Easy Paper Writing Tips |
| Andrej Karpathy | 单一贡献聚焦 | 各类讲座 |
Paper Writing Checklist:
- [ ] Step 1: Define the one-sentence contribution
- [ ] Step 2: Draft Figure 1 (core idea or most compelling result)
- [ ] Step 3: Draft abstract (5-sentence formula)
- [ ] Step 4: Draft introduction (1-1.5 pages max)
- [ ] Step 5: Draft methods
- [ ] Step 6: Draft experiments & results
- [ ] Step 7: Draft related work
- [ ] Step 8: Draft conclusion & discussion
- [ ] Step 9: Draft limitations (REQUIRED by all venues)
- [ ] Step 10: Plan appendix (proofs, extra experiments, details)
- [ ] Step 11: Complete paper checklist
- [ ] Step 12: Final reviewSecond-pass refinement prompt (per section):
"Review the [SECTION] in the context of the complete paper.
- Does it fit with the rest of the paper? Are there redundancies with other sections?
- Is terminology consistent with Introduction and Methods?
- Can anything be cut without weakening the message?
- Does the narrative flow from the previous section and into the next?
Make minimal, targeted edits. Do not rewrite from scratch."LaTeX Quality Checklist (verify after every edit):
- [ ] No unenclosed math symbols ($ signs balanced)
- [ ] Only reference figures/tables that exist (\ref matches \label)
- [ ] No fabricated citations (\cite matches entries in .bib)
- [ ] Every \begin{env} has matching \end{env} (especially figure, table, algorithm)
- [ ] No HTML contamination (</end{figure}> instead of \end{figure})
- [ ] No unescaped underscores outside math mode (use \_ in text)
- [ ] No duplicate \label definitions
- [ ] No duplicate section headers
- [ ] Numbers in text match actual experimental results
- [ ] All figures have captions and labels
- [ ] No overly long lines that cause overfull hbox warnings1. What you achieved: "We introduce...", "We prove...", "We demonstrate..."
2. Why this is hard and important
3. How you do it (with specialist keywords for discoverability)
4. What evidence you have
5. Your most remarkable number/result| 图 1 类型 | 适用场景 | 示例 |
|---|---|---|
| 方法图 | 新架构或流水线 | 展示系统的 TikZ 流程图 |
| 结果预告 | 一个引人注目的结果能讲述整个故事 | 柱状图:"我们的方法 vs 基线",差距清晰 |
| 问题说明 | 问题不直观 | 前后对比,展示你解决的失败模式 |
| 概念图 | 抽象贡献需要视觉支撑 | 展示方法属性的 2×2 矩阵 |
| 附录章节 | 内容 |
|---|---|
| 证明与推导 | 正文太长的完整证明。正文可陈述定理并注明"证明见附录 A"。 |
| 额外实验 | 消融实验、规模曲线、按数据集分解、超参数敏感性 |
| 实现细节 | 完整超参数表、训练细节、硬件规格、随机种子 |
| 数据集文档 | 数据收集过程、标注指南、许可证、预处理 |
| Prompt 与模板 | 使用的确切 prompt(对基于 LLM 的方法)、评估模板 |
| 人工评估 | 标注界面截图、给标注员的说明、IRB 细节 |
| 额外图表 | 按任务分解、轨迹可视化、失败案例示例 |
\appendix 命令,然后 \section{A: Proofs} 等| 削减策略 | 节省 | 风险 |
|---|---|---|
| 将证明移至附录 | 0.5-2 页 | 低——标准做法 |
| 压缩相关工作 | 0.5-1 页 | 中——可能遗漏关键引用 |
| 将表格与子图合并 | 0.25-0.5 页 | 低——通常提升可读性 |
谨慎使用 \vspace{-Xpt} | 0.1-0.3 页 | 细微时低,明显时高 |
| 删除定性示例 | 0.5-1 页 | 中——审稿人喜欢示例 |
| 缩小图形尺寸 | 0.25-0.5 页 | 高——图形必须保持可读 |
\small/\footnotesize。| 组成部分 | 内容 | 要求方 |
|---|---|---|
| 积极的社会影响 | 你的工作如何造福社会 | NeurIPS、ICML |
| 潜在负面影响 | 滥用风险、两用性问题、失败模式 | NeurIPS、ICML |
| 公平性与偏见 | 你的方法/数据是否存在 已知偏见? | 所有会议(隐性要求) |
| 环境影响 | 大规模训练的计算碳足迹 | ICML,NeurIPS 日益要求 |
| 隐私 | 你的工作是否使用或允许处理个人数据? | ACL、NeurIPS |
| LLM 披露 | 写作或实验中是否使用了 AI? | ICLR(强制),ACL |
Dataset Documentation (Appendix):
- Motivation: Why was this dataset created? What task does it support?
- Composition: What are the instances? How many? What data types?
- Collection: How was data collected? What was the source?
- Preprocessing: What cleaning/filtering was applied?
- Distribution: How is the dataset distributed? Under what license?
- Maintenance: Who maintains it? How to report issues?
- Ethical considerations: Contains personal data? Consent obtained?
Potential for harm? Known biases?Model Card (Appendix):
- Model details: Architecture, training data, training procedure
- Intended use: Primary use cases, out-of-scope uses
- Metrics: Evaluation metrics and results on benchmarks
- Ethical considerations: Known biases, fairness evaluations
- Limitations: Known failure modes, domains where model underperforms| 原则 | 规则 |
|---|---|
| 主谓接近 | 保持主语和谓语紧密相连 |
| 强调位置 | 将重点放在句末 |
| 主题位置 | 先放上下文,后放新信息 |
| 旧信息在前 | 熟悉信息 → 陌生信息 |
| 一个单元,一个功能 | 每段只表达一个观点 |
| 动作在动词中 | 使用动词,而非名词化 |
| 先铺垫后呈现 | 先设置场景,再呈现内容 |
Template Setup Checklist:
- [ ] Step 1: Copy entire template directory to new project
- [ ] Step 2: Verify template compiles as-is (before any changes)
- [ ] Step 3: Read the template's example content to understand structure
- [ ] Step 4: Replace example content section by section
- [ ] Step 5: Use template macros (check preamble for \newcommand definitions)
- [ ] Step 6: Clean up template artifacts only at the endtlmgr install <package> 安装)。| 陷阱 | 问题 | 解决方案 |
|---|---|---|
只复制 .tex 文件 | 缺少 .sty,无法编译 | 复制整个目录 |
修改 .sty 文件 | 破坏会议格式 | 绝不编辑样式文件 |
| 随意添加包 | 冲突,破坏模板 | 仅在必要时添加 |
| 过早删除模板内容 | 失去格式参考 | 保留为注释直到完成 |
| 不频繁编译 | 错误积累 | 每个章节后编译 |
| 图形使用光栅 PNG | 论文中模糊 | 始终通过 savefig('fig.pdf') 使用矢量 PDF |
| 会议 | 主文件 | 样式文件 | 页面限制 |
|---|---|---|---|
| NeurIPS 2025 | main.tex | neurips.sty | 9 页 |
| ICML 2026 | example_paper.tex | icml2026.sty | 8 页 |
| ICLR 2026 | iclr2026_conference.tex | iclr2026_conference.sty | 9 页 |
| ACL 2025 | acl_latex.tex | acl.sty | 8 页(长文) |
| AAAI 2026 | aaai2026-unified-template.tex | aaai2026.sty | 7 页 |
| COLM 2025 | colm2025_conference.tex | colm2025_conference.sty | 9 页 |
templates/ 目录。编译设置(VS Code、CLI、Overleaf、其他 IDE)请参见 templates/README.md。booktabs 实现专业格式:plt.savefig('fig.pdf')microtype 是视觉质量影响最大的单个包。它在亚像素级别调整字符间距。始终包含它。siunitx 通过 S 列类型处理表格中的小数对齐——消除手动间距。cleveref 必须在 hyperref 之后加载。大多数会议 .sty 文件会加载 hyperref,所以将 cleveref 放在最后。algorithm、amsmath、graphicx)。不要重复加载。siunitx 使数字密集的表格显著更易读:S 列类型自动按小数点对齐。{} 中的表头跳过对齐。\cref{fig:results} → "Figure 1",\cref{fig:results-a} → "Figure 1a"。figsize=(3.5, 2.5) — 适合一栏figsize=(7.0, 3.0) — 跨两栏figsize=(3.5, 3.5) — 用于热力图、混淆矩阵You are an expert reviewer for [VENUE]. You are critical and thorough.
If a paper has weaknesses or you are unsure about a claim, flag it clearly
and reflect that in your scores. Do not give the benefit of the doubt.
Review this paper according to the official reviewer guidelines. Evaluate:
1. Soundness (are claims well-supported? are baselines fair and strong?)
2. Clarity (is the paper well-written? could an expert reproduce it?)
3. Significance (does this matter to the community?)
4. Originality (new insights, not just incremental combination?)
Provide your review as structured JSON:
{
"summary": "2-3 sentence summary",
"strengths": ["strength 1", "strength 2", ...],
"weaknesses": ["weakness 1 (most critical)", "weakness 2", ...],
"questions": ["question for authors 1", ...],
"missing_references": ["paper that should be cited", ...],
"soundness": 1-4,
"presentation": 1-4,
"contribution": 1-4,
"overall": 1-10,
"confidence": 1-5
}You are an Area Chair at [VENUE]. You have received [N] independent reviews
of a paper. Your job is to:
1. Identify consensus strengths and weaknesses across reviewers
2. Resolve disagreements by examining the paper directly
3. Produce a meta-review that represents the aggregate judgment
4. Use AVERAGED numerical scores across all reviews
Be conservative: if reviewers disagree on whether a weakness is serious,
treat it as serious until the authors address it.
Reviews:
[review_1]
[review_2]
...