PPTX
每当以任何方式涉及.pptx 文件时(作为输入、输出或两者),都可以使用此技能。这包括:创建幻灯片、宣传材料或演示文稿;从任何.pptx 文件中读取、解析或提取文本(即使提取的内容将在其他地方使用,例如在电子邮件或摘要中);编辑、修改或更新现有演示文稿;合并或拆分幻灯片文件;使用模板、布局、演讲者注释或评论。每当用户提及“甲板”、“幻灯片”、“演示文稿”或引用.pptx 文件名时触发,无论他们随后计划如何处理内容。如果需要打开、创建或触摸.pptx 文件,请使用此技能。
来源:内容改编自人类/技能(麻省理工学院)。
快速参考
| 任务 | 指南 |
|---|---|
| 阅读/分析内容 | python -m markitdown presentation.pptx |
| 编辑或从模板创建 | 阅读编辑.md |
| 从头开始创建 | 阅读 pptxgenjs.md |
阅读内容
# Text extraction
python -m markitdown presentation.pptx
# Visual overview
python scripts/thumbnail.py presentation.pptx
# Raw XML
python scripts/office/unpack.py presentation.pptx unpacked/编辑工作流程
阅读 editing.md 了解完整详细信息。
- 使用
thumbnail.py分析模板 - 解压 -> 操作幻灯片 -> 编辑内容 -> 清理 -> 打包
从头开始创建
阅读 pptxgenjs.md 了解完整详细信息。
当没有可用的模板或参考演示文稿时使用。
设计理念
不要创建无聊的幻灯片。 白色背景上的普通项目符号不会给任何人留下深刻的印象。为每张幻灯片考虑此列表中的想法。
开始之前
- 选择一个大胆的、内容丰富的调色板:调色板应该感觉是专为这个主题设计的。如果将颜色交换为完全不同的演示文稿仍然“有效”,那么您还没有做出足够具体的选择。
- 主导高于平等:一种颜色应占主导地位(60-70% 视觉权重),并有 1-2 个支撑色调和一种尖锐的强调色。永远不要给所有颜色同等的权重。
- 深色/浅色对比:标题+结论幻灯片的深色背景,内容的浅色背景(“三明治”结构)。或者全身采用深色以获得高级感。
- 致力于视觉主题:选择一个独特的元素并重复它 - 圆形图像框架、彩色圆圈图标、粗单边边框。将其放在每张幻灯片上。
调色板
选择与您的主题相匹配的颜色 - 不要默认为普通蓝色。使用这些调色板作为灵感:
| 主题 | 小学 | 中学 | 口音 |
|---|---|---|---|
| 午夜行政 | 1E2761(海军蓝) | CADCFC(冰蓝色) | FFFFFF(白色) |
| 森林和苔藓 | 2C5F2D(森林) | 97BC62(苔藓) | F5F5F5(奶油色) |
| 珊瑚能源 | F96167(珊瑚色) | F9E795(金色) | 2F3C7E(海军蓝) |
| 温暖的赤土陶器 | B85042(赤土) | E7E8D1(沙) | A7BEAE(圣人) |
| 海洋渐变 | 065A82(深蓝色) | 1C7293(青色) | 21295C(午夜) |
| 木炭最少 | 36454F(木炭) | F2F2F2(灰白色) | 212121(黑色) |
| 青色信托 | 028090(青色) | 00A896(海泡) | 02C39A(完好) |
| 浆果和奶油 | 6D2E46(浆果) | A26769(尘土飞扬的玫瑰色) | ECE2D0(奶油色) |
| 圣人平静 | 84B59F(圣人) | 69A297(桉树) | 50808E(板岩) |
| 樱桃大胆 | 990011(樱桃) | FCF6F5(灰白色) | 2F3C7E(海军蓝) |
对于每张幻灯片
每张幻灯片都需要一个视觉元素 - 图像、图表、图标或形状。纯文本幻灯片很容易被遗忘。
布局选项:
- 两栏(左侧为文字,右侧为插图)
- 图标 + 文本行(彩色圆圈中的图标、粗体标题、下面的说明)
- 2x2 或 2x3 网格(一侧为图像,另一侧为内容块网格)
- 带有内容叠加的半出血图像(完整的左侧或右侧)
数据显示:
- 大统计标注(大数字 60-72pt,下面带有小标签)
- 比较栏(之前/之后、优点/缺点、并排选项)
- 时间线或流程(编号步骤、箭头)
视觉修饰:
- 节标题旁边的小彩色圆圈中的图标
- 关键统计数据或标语的斜体重音文字
版式
选择有趣的字体配对 - 不要默认为 Arial。选择具有个性的标题字体,并将其与干净的正文字体配对。
| 标题字体 | 正文字体 |
|---|---|
| 格鲁吉亚 | 卡利布里 |
| 宋体黑 | 宋体 |
| 卡利布里 | 卡利布里光 |
| 坎布里亚 | 卡利布里 |
| 投石机 MS | 卡利布里 |
| 影响 | 宋体 |
| 帕拉蒂诺 | 加拉蒙 |
| 康索拉斯 | 卡利布里 |
| 元素 | 尺寸 |
|---|---|
| 幻灯片标题 | 36-44pt 粗体 |
| 节标题 | 20-24pt 粗体 |
| 正文 | 14-16 分 |
| 字幕 | 10-12 点静音 |
间距
- 0.5" 最小边距
- 内容块之间 0.3-0.5"
- 留出喘息的空间——不要填满每一寸
避免(常见错误)
- 不要重复相同的布局 - 在幻灯片中改变列、卡片和标注
- 不要将正文居中 - 左对齐段落和列表;仅中心标题
- 不要吝惜大小对比 - 标题需要 36pt+ 才能从 14-16pt 正文中脱颖而出
- 不要默认为蓝色 - 选择反映特定主题的颜色
- 不要随意混合间距 - 选择 0.3 英寸或 0.5 英寸间隙并一致使用
- 不要设计一张幻灯片的样式而让其余部分保持简单 - 完全承诺或始终保持简单
- 不要创建纯文本幻灯片 - 添加图像、图标、图表或视觉元素;避免简单的标题+项目符号
- 不要忘记文本框填充 - 将线条或形状与文本边缘对齐时,在文本框上设置
margin: 0或偏移形状以考虑填充 - 不要使用低对比度元素 - 图标和文本需要与背景形成强烈对比;避免在浅色背景上使用浅色文本或在深色背景上使用深色文本
- 切勿在标题下使用重音线 - 这是人工智能生成幻灯片的标志;使用空白或背景颜色代替
质量检查(必填)
假设存在问题。你的工作就是找到他们。
你的第一次渲染几乎从来都不正确。将 QA 视为 bug 搜寻,而不是确认步骤。如果您在第一次检查时发现零问题,则说明您检查得不够仔细。
内容质量检查
python -m markitdown output.pptx检查是否有内容缺失、拼写错误、顺序错误。
使用模板时,检查剩余的占位符文本:
python -m markitdown output.pptx | grep -iE "xxxx|lorem|ipsum|this.*(page|slide).*layout"如果 grep 返回结果,请在声明成功之前修复它们。
视觉质量保证
** 使用SUBAGENTS** - 即使是 2-3 张幻灯片。你一直在盯着代码,你会看到你所期望的,而不是那里有什么。子代理有新鲜的眼光。
将幻灯片转换为图像(请参阅转换为图像),然后使用此提示:
Visually inspect these slides. Assume there are issues - find them.
Look for:
- Overlapping elements (text through shapes, lines through words, stacked elements)
- Text overflow or cut off at edges/box boundaries
- Decorative lines positioned for single-line text but title wrapped to two lines
- Source citations or footers colliding with content above
- Elements too close (< 0.3" gaps) or cards/sections nearly touching
- Uneven gaps (large empty area in one place, cramped in another)
- Insufficient margin from slide edges (< 0.5")
- Columns or similar elements not aligned consistently
- Low-contrast text (e.g., light gray text on cream-colored background)
- Low-contrast icons (e.g., dark icons on dark backgrounds without a contrasting circle)
- Text boxes too narrow causing excessive wrapping
- Leftover placeholder content
For each slide, list issues or areas of concern, even if minor.
Read and analyze these images:
1. /path/to/slide-01.jpg (Expected: [brief description])
2. /path/to/slide-02.jpg (Expected: [brief description])
Report ALL issues found, including minor ones.验证循环
- 生成幻灯片 -> 转换为图像 -> 检查
- 列出发现的问题(如果没有发现,请更仔细地再次查看)
- 修复问题
- 重新验证受影响的幻灯片 - 一项修复通常会产生另一个问题
- 重复直到完整通过没有发现新问题
在至少完成一个修复和验证周期之前,不要宣布成功。
转换为图像
将演示文稿转换为单独的幻灯片图像以进行目视检查:
python scripts/office/soffice.py --headless --convert-to pdf output.pptx
pdftoppm -jpeg -r 150 output.pdf slide这将创建slide-01.jpg、slide-02.jpg等。
要在修复后重新渲染特定幻灯片:
pdftoppm -jpeg -r 150 -f N -l N output.pdf slide-fixed依赖关系
pip install "markitdown[pptx]"- 文本提取pip install Pillow- 缩略图网格npm install -g pptxgenjs- 从头开始创建- LibreOffice (
soffice) - PDF 转换(通过scripts/office/soffice.py自动配置沙盒环境) - Poppler (
pdftoppm) - PDF 到图像
资源文件
许可证.txt
二进制资源
编辑.md
二进制资源
pptxgenjs.md
二进制资源
脚本/init.py
二进制资源
脚本/add_slide.py
二进制资源
脚本/clean.py
二进制资源
脚本/office/helpers/init.py
二进制资源
脚本/office/helpers/merge_runs.py
下载脚本/office/helpers/merge_runs.py
"""Merge adjacent runs with identical formatting in DOCX.
Merges adjacent <w:r> elements that have identical <w:rPr> properties.
Works on runs in paragraphs and inside tracked changes (<w:ins>, <w:del>).
Also:
- Removes rsid attributes from runs (revision metadata that doesn't affect rendering)
- Removes proofErr elements (spell/grammar markers that block merging)
"""
from pathlib import Path
import defusedxml.minidom
def merge_runs(input_dir: str) -> tuple[int, str]:
doc_xml = Path(input_dir) / "word" / "document.xml"
if not doc_xml.exists():
return 0, f"Error: {doc_xml} not found"
try:
dom = defusedxml.minidom.parseString(doc_xml.read_text(encoding="utf-8"))
root = dom.documentElement
_remove_elements(root, "proofErr")
_strip_run_rsid_attrs(root)
containers = {run.parentNode for run in _find_elements(root, "r")}
merge_count = 0
for container in containers:
merge_count += _merge_runs_in(container)
doc_xml.write_bytes(dom.toxml(encoding="UTF-8"))
return merge_count, f"Merged {merge_count} runs"
except Exception as e:
return 0, f"Error: {e}"
def _find_elements(root, tag: str) -> list:
results = []
def traverse(node):
if node.nodeType == node.ELEMENT_NODE:
name = node.localName or node.tagName
if name == tag or name.endswith(f":{tag}"):
results.append(node)
for child in node.childNodes:
traverse(child)
traverse(root)
return results
def _get_child(parent, tag: str):
for child in parent.childNodes:
if child.nodeType == child.ELEMENT_NODE:
name = child.localName or child.tagName
if name == tag or name.endswith(f":{tag}"):
return child
return None
def _get_children(parent, tag: str) -> list:
results = []
for child in parent.childNodes:
if child.nodeType == child.ELEMENT_NODE:
name = child.localName or child.tagName
if name == tag or name.endswith(f":{tag}"):
results.append(child)
return results
def _is_adjacent(elem1, elem2) -> bool:
node = elem1.nextSibling
while node:
if node == elem2:
return True
if node.nodeType == node.ELEMENT_NODE:
return False
if node.nodeType == node.TEXT_NODE and node.data.strip():
return False
node = node.nextSibling
return False
def _remove_elements(root, tag: str):
for elem in _find_elements(root, tag):
if elem.parentNode:
elem.parentNode.removeChild(elem)
def _strip_run_rsid_attrs(root):
for run in _find_elements(root, "r"):
for attr in list(run.attributes.values()):
if "rsid" in attr.name.lower():
run.removeAttribute(attr.name)
def _merge_runs_in(container) -> int:
merge_count = 0
run = _first_child_run(container)
while run:
while True:
next_elem = _next_element_sibling(run)
if next_elem and _is_run(next_elem) and _can_merge(run, next_elem):
_merge_run_content(run, next_elem)
container.removeChild(next_elem)
merge_count += 1
else:
break
_consolidate_text(run)
run = _next_sibling_run(run)
return merge_count
def _first_child_run(container):
for child in container.childNodes:
if child.nodeType == child.ELEMENT_NODE and _is_run(child):
return child
return None
def _next_element_sibling(node):
sibling = node.nextSibling
while sibling:
if sibling.nodeType == sibling.ELEMENT_NODE:
return sibling
sibling = sibling.nextSibling
return None
def _next_sibling_run(node):
sibling = node.nextSibling
while sibling:
if sibling.nodeType == sibling.ELEMENT_NODE:
if _is_run(sibling):
return sibling
sibling = sibling.nextSibling
return None
def _is_run(node) -> bool:
name = node.localName or node.tagName
return name == "r" or name.endswith(":r")
def _can_merge(run1, run2) -> bool:
rpr1 = _get_child(run1, "rPr")
rpr2 = _get_child(run2, "rPr")
if (rpr1 is None) != (rpr2 is None):
return False
if rpr1 is None:
return True
return rpr1.toxml() == rpr2.toxml()
def _merge_run_content(target, source):
for child in list(source.childNodes):
if child.nodeType == child.ELEMENT_NODE:
name = child.localName or child.tagName
if name != "rPr" and not name.endswith(":rPr"):
target.appendChild(child)
def _consolidate_text(run):
t_elements = _get_children(run, "t")
for i in range(len(t_elements) - 1, 0, -1):
curr, prev = t_elements[i], t_elements[i - 1]
if _is_adjacent(prev, curr):
prev_text = prev.firstChild.data if prev.firstChild else ""
curr_text = curr.firstChild.data if curr.firstChild else ""
merged = prev_text + curr_text
if prev.firstChild:
prev.firstChild.data = merged
else:
prev.appendChild(run.ownerDocument.createTextNode(merged))
if merged.startswith(" ") or merged.endswith(" "):
prev.setAttribute("xml:space", "preserve")
elif prev.hasAttribute("xml:space"):
prev.removeAttribute("xml:space")
run.removeChild(curr)脚本/office/helpers/simplify_redlines.py
下载脚本/office/helpers/simplify_redlines.py
"""Simplify tracked changes by merging adjacent w:ins or w:del elements.
Merges adjacent <w:ins> elements from the same author into a single element.
Same for <w:del> elements. This makes heavily-redlined documents easier to
work with by reducing the number of tracked change wrappers.
Rules:
- Only merges w:ins with w:ins, w:del with w:del (same element type)
- Only merges if same author (ignores timestamp differences)
- Only merges if truly adjacent (only whitespace between them)
"""
import xml.etree.ElementTree as ET
import zipfile
from pathlib import Path
import defusedxml.minidom
WORD_NS = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
def simplify_redlines(input_dir: str) -> tuple[int, str]:
doc_xml = Path(input_dir) / "word" / "document.xml"
if not doc_xml.exists():
return 0, f"Error: {doc_xml} not found"
try:
dom = defusedxml.minidom.parseString(doc_xml.read_text(encoding="utf-8"))
root = dom.documentElement
merge_count = 0
containers = _find_elements(root, "p") + _find_elements(root, "tc")
for container in containers:
merge_count += _merge_tracked_changes_in(container, "ins")
merge_count += _merge_tracked_changes_in(container, "del")
doc_xml.write_bytes(dom.toxml(encoding="UTF-8"))
return merge_count, f"Simplified {merge_count} tracked changes"
except Exception as e:
return 0, f"Error: {e}"
def _merge_tracked_changes_in(container, tag: str) -> int:
merge_count = 0
tracked = [
child
for child in container.childNodes
if child.nodeType == child.ELEMENT_NODE and _is_element(child, tag)
]
if len(tracked) < 2:
return 0
i = 0
while i < len(tracked) - 1:
curr = tracked[i]
next_elem = tracked[i + 1]
if _can_merge_tracked(curr, next_elem):
_merge_tracked_content(curr, next_elem)
container.removeChild(next_elem)
tracked.pop(i + 1)
merge_count += 1
else:
i += 1
return merge_count
def _is_element(node, tag: str) -> bool:
name = node.localName or node.tagName
return name == tag or name.endswith(f":{tag}")
def _get_author(elem) -> str:
author = elem.getAttribute("w:author")
if not author:
for attr in elem.attributes.values():
if attr.localName == "author" or attr.name.endswith(":author"):
return attr.value
return author
def _can_merge_tracked(elem1, elem2) -> bool:
if _get_author(elem1) != _get_author(elem2):
return False
node = elem1.nextSibling
while node and node != elem2:
if node.nodeType == node.ELEMENT_NODE:
return False
if node.nodeType == node.TEXT_NODE and node.data.strip():
return False
node = node.nextSibling
return True
def _merge_tracked_content(target, source):
while source.firstChild:
child = source.firstChild
source.removeChild(child)
target.appendChild(child)
def _find_elements(root, tag: str) -> list:
results = []
def traverse(node):
if node.nodeType == node.ELEMENT_NODE:
name = node.localName or node.tagName
if name == tag or name.endswith(f":{tag}"):
results.append(node)
for child in node.childNodes:
traverse(child)
traverse(root)
return results
def get_tracked_change_authors(doc_xml_path: Path) -> dict[str, int]:
if not doc_xml_path.exists():
return {}
try:
tree = ET.parse(doc_xml_path)
root = tree.getroot()
except ET.ParseError:
return {}
namespaces = {"w": WORD_NS}
author_attr = f"{{{WORD_NS}}}author"
authors: dict[str, int] = {}
for tag in ["ins", "del"]:
for elem in root.findall(f".//w:{tag}", namespaces):
author = elem.get(author_attr)
if author:
authors[author] = authors.get(author, 0) + 1
return authors
def _get_authors_from_docx(docx_path: Path) -> dict[str, int]:
try:
with zipfile.ZipFile(docx_path, "r") as zf:
if "word/document.xml" not in zf.namelist():
return {}
with zf.open("word/document.xml") as f:
tree = ET.parse(f)
root = tree.getroot()
namespaces = {"w": WORD_NS}
author_attr = f"{{{WORD_NS}}}author"
authors: dict[str, int] = {}
for tag in ["ins", "del"]:
for elem in root.findall(f".//w:{tag}", namespaces):
author = elem.get(author_attr)
if author:
authors[author] = authors.get(author, 0) + 1
return authors
except (zipfile.BadZipFile, ET.ParseError):
return {}
def infer_author(modified_dir: Path, original_docx: Path, default: str = "Claude") -> str:
modified_xml = modified_dir / "word" / "document.xml"
modified_authors = get_tracked_change_authors(modified_xml)
if not modified_authors:
return default
original_authors = _get_authors_from_docx(original_docx)
new_changes: dict[str, int] = {}
for author, count in modified_authors.items():
original_count = original_authors.get(author, 0)
diff = count - original_count
if diff > 0:
new_changes[author] = diff
if not new_changes:
return default
if len(new_changes) == 1:
return next(iter(new_changes))
raise ValueError(
f"Multiple authors added new changes: {new_changes}. "
"Cannot infer which author to validate."
)脚本/office/pack.py
"""Pack a directory into a DOCX, PPTX, or XLSX file.
Validates with auto-repair, condenses XML formatting, and creates the Office file.
Usage:
python pack.py <input_directory> <output_file> [--original <file>] [--validate true|false]
Examples:
python pack.py unpacked/ output.docx --original input.docx
python pack.py unpacked/ output.pptx --validate false
"""
import argparse
import sys
import shutil
import tempfile
import zipfile
from pathlib import Path
import defusedxml.minidom
from validators import DOCXSchemaValidator, PPTXSchemaValidator, RedliningValidator
def pack(
input_directory: str,
output_file: str,
original_file: str | None = None,
validate: bool = True,
infer_author_func=None,
) -> tuple[None, str]:
input_dir = Path(input_directory)
output_path = Path(output_file)
suffix = output_path.suffix.lower()
if not input_dir.is_dir():
return None, f"Error: {input_dir} is not a directory"
if suffix not in {".docx", ".pptx", ".xlsx"}:
return None, f"Error: {output_file} must be a .docx, .pptx, or .xlsx file"
if validate and original_file:
original_path = Path(original_file)
if original_path.exists():
success, output = _run_validation(
input_dir, original_path, suffix, infer_author_func
)
if output:
print(output)
if not success:
return None, f"Error: Validation failed for {input_dir}"
with tempfile.TemporaryDirectory() as temp_dir:
temp_content_dir = Path(temp_dir) / "content"
shutil.copytree(input_dir, temp_content_dir)
for pattern in ["*.xml", "*.rels"]:
for xml_file in temp_content_dir.rglob(pattern):
_condense_xml(xml_file)
output_path.parent.mkdir(parents=True, exist_ok=True)
with zipfile.ZipFile(output_path, "w", zipfile.ZIP_DEFLATED) as zf:
for f in temp_content_dir.rglob("*"):
if f.is_file():
zf.write(f, f.relative_to(temp_content_dir))
return None, f"Successfully packed {input_dir} to {output_file}"
def _run_validation(
unpacked_dir: Path,
original_file: Path,
suffix: str,
infer_author_func=None,
) -> tuple[bool, str | None]:
output_lines = []
validators = []
if suffix == ".docx":
author = "Claude"
if infer_author_func:
try:
author = infer_author_func(unpacked_dir, original_file)
except ValueError as e:
print(f"Warning: {e} Using default author 'Claude'.", file=sys.stderr)
validators = [
DOCXSchemaValidator(unpacked_dir, original_file),
RedliningValidator(unpacked_dir, original_file, author=author),
]
elif suffix == ".pptx":
validators = [PPTXSchemaValidator(unpacked_dir, original_file)]
if not validators:
return True, None
total_repairs = sum(v.repair() for v in validators)
if total_repairs:
output_lines.append(f"Auto-repaired {total_repairs} issue(s)")
success = all(v.validate() for v in validators)
if success:
output_lines.append("All validations PASSED!")
return success, "\n".join(output_lines) if output_lines else None
def _condense_xml(xml_file: Path) -> None:
try:
with open(xml_file, encoding="utf-8") as f:
dom = defusedxml.minidom.parse(f)
for element in dom.getElementsByTagName("*"):
if element.tagName.endswith(":t"):
continue
for child in list(element.childNodes):
if (
child.nodeType == child.TEXT_NODE
and child.nodeValue
and child.nodeValue.strip() == ""
) or child.nodeType == child.COMMENT_NODE:
element.removeChild(child)
xml_file.write_bytes(dom.toxml(encoding="UTF-8"))
except Exception as e:
print(f"ERROR: Failed to parse {xml_file.name}: {e}", file=sys.stderr)
raise
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Pack a directory into a DOCX, PPTX, or XLSX file"
)
parser.add_argument("input_directory", help="Unpacked Office document directory")
parser.add_argument("output_file", help="Output Office file (.docx/.pptx/.xlsx)")
parser.add_argument(
"--original",
help="Original file for validation comparison",
)
parser.add_argument(
"--validate",
type=lambda x: x.lower() == "true",
default=True,
metavar="true|false",
help="Run validation with auto-repair (default: true)",
)
args = parser.parse_args()
_, message = pack(
args.input_directory,
args.output_file,
original_file=args.original,
validate=args.validate,
)
print(message)
if "Error" in message:
sys.exit(1)脚本/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/pml.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/pml.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/sml.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/sml.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/wml.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/wml.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/xml.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/xml.xsd
二进制资源
脚本/office/schemas/ecma/第四版/opc-contentTypes.xsd
下载脚本/office/schemas/ecma/fouth-edition/opc-contentTypes.xsd
二进制资源
脚本/office/schemas/ecma/第四版/opc-coreProperties.xsd
下载脚本/office/schemas/ecma/fouth-edition/opc-coreProperties.xsd
二进制资源
脚本/office/schemas/ecma/第四版/opc-digSig.xsd
下载脚本/office/schemas/ecma/fouth-edition/opc-digSig.xsd
二进制资源
脚本/office/schemas/ecma/第四版/opc-relationships.xsd
下载脚本/office/schemas/ecma/fouth-edition/opc-relationships.xsd
二进制资源
脚本/office/schemas/mce/mc.xsd
下载脚本/office/schemas/mce/mc.xsd
二进制资源
脚本/office/schemas/microsoft/wml-2010.xsd
下载脚本/office/schemas/microsoft/wml-2010.xsd
二进制资源
脚本/office/schemas/microsoft/wml-2012.xsd
下载脚本/office/schemas/microsoft/wml-2012.xsd
二进制资源
脚本/office/schemas/microsoft/wml-2018.xsd
下载脚本/office/schemas/microsoft/wml-2018.xsd
二进制资源
脚本/office/schemas/microsoft/wml-cex-2018.xsd
下载脚本/office/schemas/microsoft/wml-cex-2018.xsd
二进制资源
脚本/office/schemas/microsoft/wml-cid-2016.xsd
下载脚本/office/schemas/microsoft/wml-cid-2016.xsd
二进制资源
脚本/office/schemas/microsoft/wml-sdtdatahash-2020.xsd
下载脚本/office/schemas/microsoft/wml-sdtdatahash-2020.xsd
二进制资源
脚本/office/schemas/microsoft/wml-symex-2015.xsd
下载脚本/office/schemas/microsoft/wml-symex-2015.xsd
二进制资源
脚本/office/soffice.py
"""
Helper for running LibreOffice (soffice) in environments where AF_UNIX
sockets may be blocked (e.g., sandboxed VMs). Detects the restriction
at runtime and applies an LD_PRELOAD shim if needed.
Usage:
from office.soffice import run_soffice, get_soffice_env
# Option 1 – run soffice directly
result = run_soffice(["--headless", "--convert-to", "pdf", "input.docx"])
# Option 2 – get env dict for your own subprocess calls
env = get_soffice_env()
subprocess.run(["soffice", ...], env=env)
"""
import os
import socket
import subprocess
import tempfile
from pathlib import Path
def get_soffice_env() -> dict:
env = os.environ.copy()
env["SAL_USE_VCLPLUGIN"] = "svp"
if _needs_shim():
shim = _ensure_shim()
env["LD_PRELOAD"] = str(shim)
return env
def run_soffice(args: list[str], **kwargs) -> subprocess.CompletedProcess:
env = get_soffice_env()
return subprocess.run(["soffice"] + args, env=env, **kwargs)
_SHIM_SO = Path(tempfile.gettempdir()) / "lo_socket_shim.so"
def _needs_shim() -> bool:
try:
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.close()
return False
except OSError:
return True
def _ensure_shim() -> Path:
if _SHIM_SO.exists():
return _SHIM_SO
src = Path(tempfile.gettempdir()) / "lo_socket_shim.c"
src.write_text(_SHIM_SOURCE)
subprocess.run(
["gcc", "-shared", "-fPIC", "-o", str(_SHIM_SO), str(src), "-ldl"],
check=True,
capture_output=True,
)
src.unlink()
return _SHIM_SO
_SHIM_SOURCE = r"""
#define _GNU_SOURCE
#include <dlfcn.h>
#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <unistd.h>
static int (*real_socket)(int, int, int);
static int (*real_socketpair)(int, int, int, int[2]);
static int (*real_listen)(int, int);
static int (*real_accept)(int, struct sockaddr *, socklen_t *);
static int (*real_close)(int);
static int (*real_read)(int, void *, size_t);
/* Per-FD bookkeeping (FDs >= 1024 are passed through unshimmed). */
static int is_shimmed[1024];
static int peer_of[1024];
static int wake_r[1024]; /* accept() blocks reading this */
static int wake_w[1024]; /* close() writes to this */
static int listener_fd = -1; /* FD that received listen() */
__attribute__((constructor))
static void init(void) {
real_socket = dlsym(RTLD_NEXT, "socket");
real_socketpair = dlsym(RTLD_NEXT, "socketpair");
real_listen = dlsym(RTLD_NEXT, "listen");
real_accept = dlsym(RTLD_NEXT, "accept");
real_close = dlsym(RTLD_NEXT, "close");
real_read = dlsym(RTLD_NEXT, "read");
for (int i = 0; i < 1024; i++) {
peer_of[i] = -1;
wake_r[i] = -1;
wake_w[i] = -1;
}
}
/* ---- socket ---------------------------------------------------------- */
int socket(int domain, int type, int protocol) {
if (domain == AF_UNIX) {
int fd = real_socket(domain, type, protocol);
if (fd >= 0) return fd;
/* socket(AF_UNIX) blocked – fall back to socketpair(). */
int sv[2];
if (real_socketpair(domain, type, protocol, sv) == 0) {
if (sv[0] >= 0 && sv[0] < 1024) {
is_shimmed[sv[0]] = 1;
peer_of[sv[0]] = sv[1];
int wp[2];
if (pipe(wp) == 0) {
wake_r[sv[0]] = wp[0];
wake_w[sv[0]] = wp[1];
}
}
return sv[0];
}
errno = EPERM;
return -1;
}
return real_socket(domain, type, protocol);
}
/* ---- listen ---------------------------------------------------------- */
int listen(int sockfd, int backlog) {
if (sockfd >= 0 && sockfd < 1024 && is_shimmed[sockfd]) {
listener_fd = sockfd;
return 0;
}
return real_listen(sockfd, backlog);
}
/* ---- accept ---------------------------------------------------------- */
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen) {
if (sockfd >= 0 && sockfd < 1024 && is_shimmed[sockfd]) {
/* Block until close() writes to the wake pipe. */
if (wake_r[sockfd] >= 0) {
char buf;
real_read(wake_r[sockfd], &buf, 1);
}
errno = ECONNABORTED;
return -1;
}
return real_accept(sockfd, addr, addrlen);
}
/* ---- close ----------------------------------------------------------- */
int close(int fd) {
if (fd >= 0 && fd < 1024 && is_shimmed[fd]) {
int was_listener = (fd == listener_fd);
is_shimmed[fd] = 0;
if (wake_w[fd] >= 0) { /* unblock accept() */
char c = 0;
write(wake_w[fd], &c, 1);
real_close(wake_w[fd]);
wake_w[fd] = -1;
}
if (wake_r[fd] >= 0) { real_close(wake_r[fd]); wake_r[fd] = -1; }
if (peer_of[fd] >= 0) { real_close(peer_of[fd]); peer_of[fd] = -1; }
if (was_listener)
_exit(0); /* conversion done – exit */
}
return real_close(fd);
}
"""
if __name__ == "__main__":
import sys
result = run_soffice(sys.argv[1:])
sys.exit(result.returncode)脚本/office/unpack.py
"""Unpack Office files (DOCX, PPTX, XLSX) for editing.
Extracts the ZIP archive, pretty-prints XML files, and optionally:
- Merges adjacent runs with identical formatting (DOCX only)
- Simplifies adjacent tracked changes from same author (DOCX only)
Usage:
python unpack.py <office_file> <output_dir> [options]
Examples:
python unpack.py document.docx unpacked/
python unpack.py presentation.pptx unpacked/
python unpack.py document.docx unpacked/ --merge-runs false
"""
import argparse
import sys
import zipfile
from pathlib import Path
import defusedxml.minidom
from helpers.merge_runs import merge_runs as do_merge_runs
from helpers.simplify_redlines import simplify_redlines as do_simplify_redlines
SMART_QUOTE_REPLACEMENTS = {
"\u201c": "“",
"\u201d": "”",
"\u2018": "‘",
"\u2019": "’",
}
def unpack(
input_file: str,
output_directory: str,
merge_runs: bool = True,
simplify_redlines: bool = True,
) -> tuple[None, str]:
input_path = Path(input_file)
output_path = Path(output_directory)
suffix = input_path.suffix.lower()
if not input_path.exists():
return None, f"Error: {input_file} does not exist"
if suffix not in {".docx", ".pptx", ".xlsx"}:
return None, f"Error: {input_file} must be a .docx, .pptx, or .xlsx file"
try:
output_path.mkdir(parents=True, exist_ok=True)
with zipfile.ZipFile(input_path, "r") as zf:
zf.extractall(output_path)
xml_files = list(output_path.rglob("*.xml")) + list(output_path.rglob("*.rels"))
for xml_file in xml_files:
_pretty_print_xml(xml_file)
message = f"Unpacked {input_file} ({len(xml_files)} XML files)"
if suffix == ".docx":
if simplify_redlines:
simplify_count, _ = do_simplify_redlines(str(output_path))
message += f", simplified {simplify_count} tracked changes"
if merge_runs:
merge_count, _ = do_merge_runs(str(output_path))
message += f", merged {merge_count} runs"
for xml_file in xml_files:
_escape_smart_quotes(xml_file)
return None, message
except zipfile.BadZipFile:
return None, f"Error: {input_file} is not a valid Office file"
except Exception as e:
return None, f"Error unpacking: {e}"
def _pretty_print_xml(xml_file: Path) -> None:
try:
content = xml_file.read_text(encoding="utf-8")
dom = defusedxml.minidom.parseString(content)
xml_file.write_bytes(dom.toprettyxml(indent=" ", encoding="utf-8"))
except Exception:
pass
def _escape_smart_quotes(xml_file: Path) -> None:
try:
content = xml_file.read_text(encoding="utf-8")
for char, entity in SMART_QUOTE_REPLACEMENTS.items():
content = content.replace(char, entity)
xml_file.write_text(content, encoding="utf-8")
except Exception:
pass
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Unpack an Office file (DOCX, PPTX, XLSX) for editing"
)
parser.add_argument("input_file", help="Office file to unpack")
parser.add_argument("output_directory", help="Output directory")
parser.add_argument(
"--merge-runs",
type=lambda x: x.lower() == "true",
default=True,
metavar="true|false",
help="Merge adjacent runs with identical formatting (DOCX only, default: true)",
)
parser.add_argument(
"--simplify-redlines",
type=lambda x: x.lower() == "true",
default=True,
metavar="true|false",
help="Merge adjacent tracked changes from same author (DOCX only, default: true)",
)
args = parser.parse_args()
_, message = unpack(
args.input_file,
args.output_directory,
merge_runs=args.merge_runs,
simplify_redlines=args.simplify_redlines,
)
print(message)
if "Error" in message:
sys.exit(1)脚本/office/validate.py
"""
Command line tool to validate Office document XML files against XSD schemas and tracked changes.
Usage:
python validate.py <path> [--original <original_file>] [--auto-repair] [--author NAME]
The first argument can be either:
- An unpacked directory containing the Office document XML files
- A packed Office file (.docx/.pptx/.xlsx) which will be unpacked to a temp directory
Auto-repair fixes:
- paraId/durableId values that exceed OOXML limits
- Missing xml:space="preserve" on w:t elements with whitespace
"""
import argparse
import sys
import tempfile
import zipfile
from pathlib import Path
from validators import DOCXSchemaValidator, PPTXSchemaValidator, RedliningValidator
def main():
parser = argparse.ArgumentParser(description="Validate Office document XML files")
parser.add_argument(
"path",
help="Path to unpacked directory or packed Office file (.docx/.pptx/.xlsx)",
)
parser.add_argument(
"--original",
required=False,
default=None,
help="Path to original file (.docx/.pptx/.xlsx). If omitted, all XSD errors are reported and redlining validation is skipped.",
)
parser.add_argument(
"-v",
"--verbose",
action="store_true",
help="Enable verbose output",
)
parser.add_argument(
"--auto-repair",
action="store_true",
help="Automatically repair common issues (hex IDs, whitespace preservation)",
)
parser.add_argument(
"--author",
default="Claude",
help="Author name for redlining validation (default: Claude)",
)
args = parser.parse_args()
path = Path(args.path)
assert path.exists(), f"Error: {path} does not exist"
original_file = None
if args.original:
original_file = Path(args.original)
assert original_file.is_file(), f"Error: {original_file} is not a file"
assert original_file.suffix.lower() in [".docx", ".pptx", ".xlsx"], (
f"Error: {original_file} must be a .docx, .pptx, or .xlsx file"
)
file_extension = (original_file or path).suffix.lower()
assert file_extension in [".docx", ".pptx", ".xlsx"], (
f"Error: Cannot determine file type from {path}. Use --original or provide a .docx/.pptx/.xlsx file."
)
if path.is_file() and path.suffix.lower() in [".docx", ".pptx", ".xlsx"]:
temp_dir = tempfile.mkdtemp()
with zipfile.ZipFile(path, "r") as zf:
zf.extractall(temp_dir)
unpacked_dir = Path(temp_dir)
else:
assert path.is_dir(), f"Error: {path} is not a directory or Office file"
unpacked_dir = path
match file_extension:
case ".docx":
validators = [
DOCXSchemaValidator(unpacked_dir, original_file, verbose=args.verbose),
]
if original_file:
validators.append(
RedliningValidator(unpacked_dir, original_file, verbose=args.verbose, author=args.author)
)
case ".pptx":
validators = [
PPTXSchemaValidator(unpacked_dir, original_file, verbose=args.verbose),
]
case _:
print(f"Error: Validation not supported for file type {file_extension}")
sys.exit(1)
if args.auto_repair:
total_repairs = sum(v.repair() for v in validators)
if total_repairs:
print(f"Auto-repaired {total_repairs} issue(s)")
success = all(v.validate() for v in validators)
if success:
print("All validations PASSED!")
sys.exit(0 if success else 1)
if __name__ == "__main__":
main()脚本/office/validators/init.py
下载脚本/office/validators/init.py
"""
Validation modules for Word document processing.
"""
from .base import BaseSchemaValidator
from .docx import DOCXSchemaValidator
from .pptx import PPTXSchemaValidator
from .redlining import RedliningValidator
__all__ = [
"BaseSchemaValidator",
"DOCXSchemaValidator",
"PPTXSchemaValidator",
"RedliningValidator",
]脚本/office/validators/base.py
下载脚本/office/validators/base.py
二进制资源
脚本/office/validators/docx.py
下载脚本/office/validators/docx.py
二进制资源
脚本/office/validators/pptx.py
下载脚本/office/validators/pptx.py
二进制资源
脚本/office/validators/redlined.py
下载脚本/office/validators/redlined.py
二进制资源
脚本/thumbnail.py
二进制资源
参见 GitHub
当用户想要对 PDF 文件执行任何操作时,请使用此技能。这包括从 PDF 中读取或提取文本/表格、将多个 PDF 组合或合并为一个、拆分 PDF、旋转页面、添加水印、创建新 PDF、填写 PDF 表单、加密/解密 PDF、提取图像以及对扫描的 PDF 进行 OCR 使其可搜索。如果用户提到.pdf 文件或要求生成一个,请使用此技能。
Xlsx
当电子表格文件是主要输入或输出时,请使用此技能。这意味着用户想要执行的任何任务: 打开、读取、编辑或修复现有.xlsx、.xlsm、.csv 或.tsv 文件(例如,添加列、计算公式、格式化、图表、清理混乱数据);从头开始或从其他数据源创建新的电子表格;或在表格文件格式之间进行转换。特别是当用户通过名称或路径引用电子表格文件时(甚至是随意引用(例如“我下载的 xlsx”))并希望对其执行某些操作或从中生成某些内容时,会特别触发。还可以触发清理或重组混乱的表格数据文件(格式错误的行、错误的标题、垃圾数据)到正确的电子表格中。可交付成果必须是电子表格文件。当主要交付成果是 Word 文档、HTML 报告、独立 Python 脚本、数据库管道或 Google Sheets API 集成时,即使涉及表格数据,也不要触发。
claudeskills文档