Skip to content

arXiv Chinese Summary Methodology

This file records the reusable workflow for daily arXiv paper summaries in this repository.

Input Handling

  • Input is a dated list of arXiv links, usually https://arxiv.org/abs/<id> or https://arxiv.org/pdf/<id>.
  • Parse the arXiv ID without version suffix for final output, for example 2604.22986v1 becomes 2604.22986.
  • Final links must always use https://arxiv.org/abs/<id>.
  • Preserve the input date as the top-level heading: ## YYYY-MM-DD.

Retrieval Priority

For each paper, try sources in this order:

  1. arXiv HTML: https://arxiv.org/html/<id>
  2. ar5iv HTML: https://ar5iv.labs.arxiv.org/html/<id>
  3. PDF text: https://arxiv.org/pdf/<id>, extracted with pdftotext or another PDF text tool.
  4. arXiv abstract page, only if full text is unavailable.

If arXiv HTML works, prefer it over PDF because it is faster and avoids PDF extraction errors. If HTML is malformed or clearly incomplete, fall back rather than trusting it.

Cache Layout

  • Keep all paper retrieval caches under .paper_cache/.
  • Use one subfolder per date, preferably .paper_cache/YYYYMMDD/.
  • Do not create date-specific cache folders at the repository root such as .paper_cache_YYYYMMDD.
  • Store downloaded HTML, extracted HTML text, PDFs, and PDF text in that date subfolder.

Extraction Checklist

For each paper, confirm from the available text:

  • Exact English title.
  • arXiv abs link.
  • Research field and 2-5 English keywords.
  • Research problem or goal.
  • Method, model, observation, simulation, theoretical framework, or experimental setup.
  • Data/sample/instrument details when relevant.
  • Main conclusions and contributions.
  • Project, code, data, or documentation links only when explicitly present and relevant.

Do not infer details from the title alone. If only the abstract is available, write conservatively and do not claim full-paper details.

Reading Strategy

  • Read the title, abstract, introduction, methods/data, results, discussion/conclusion, data availability, and acknowledgments/link sections.
  • Use text search for terms such as Conclusion, Summary, Results, Data availability, github, zenodo, code, software, repository, and http.
  • For numerical claims, verify the exact values in the results or conclusion sections.
  • For tool/model/data papers, verify links by checking that line breaks in extracted text did not corrupt URLs.
  • When there are several papers, use parallel subagents for independent structured notes, then locally verify the important numbers and links before final output.

Keyword Style

  • Keywords are English, comma-separated, 2-5 per paper.
  • Use clean domain terms such as Fast Radio Burst, Cosmology, Solar, Observation, Theory, Simulation, Tool, Deep Learning, LLM, AGN, FRB, QPO, RFI.
  • Do not use hyphenated keywords when a non-hyphenated phrase works.
  • Add Review for review papers, Tool for software/platform papers, Observation for observational results, Theory for theoretical modeling, and Simulation for simulation-heavy papers.

Output Format

Final output must be Markdown only:

markdown
## YYYY-MM-DD

1. [Title](https://arxiv.org/abs/xxxx.xxxxx)

   > Keyword1, Keyword2, Keyword3

   Chinese summary content.

Rules:

  • Number papers continuously from 1.
  • Use the exact English title.
  • Use only arXiv abs links for title links.
  • Do not output logs, retrieval details, tool output, or a preface.
  • Do not add citation links after every paragraph.
  • Include code/data/project links in the prose only when the paper explicitly provides them and they are useful.
  • If no code/data link exists, say nothing about missing links.

Chinese Summary Style

  • Write directly and densely in Chinese.
  • Cover purpose, method, and conclusion.
  • Avoid formulaic wording such as 作者如何如何, 本文如何如何, 目的是, 方法是, 结论是.
  • Avoid empty praise and marketing language.
  • Avoid overusing quotation marks.
  • Avoid 不是……而是…… unless necessary.
  • Use 1-2 paragraphs for ordinary research papers.
  • Use concise bullets only for complex reviews or broad tool/data papers when bullets improve clarity.
  • Preserve important equations with Markdown math, for example $DM$, $RM$, $B_8 \equiv \sigma_8(\Omega_b/0.05)^{1/2}$.

Quality Gate

Before final output, check:

  • Dates and numbering are correct.
  • Titles and arXiv abs links are correct.
  • Keywords are English and not awkwardly hyphenated.
  • Each summary includes purpose, method, and conclusion.
  • Important sample sizes, instruments, models, and numerical results are verified from the text.
  • Tool/data links are included only when explicitly present.
  • No retrieval logs or process notes appear in the final answer.

基于 MIT 许可发布

加载中...