Xlsx
当电子表格文件是主要输入或输出时,请使用此技能。这意味着用户想要执行的任何任务: 打开、读取、编辑或修复现有.xlsx、.xlsm、.csv 或.tsv 文件(例如,添加列、计算公式、格式化、图表、清理混乱数据);从头开始或从其他数据源创建新的电子表格;或在表格文件格式之间进行转换。特别是当用户通过名称或路径引用电子表格文件时(甚至是随意引用(例如“我下载的 xlsx”))并希望对其执行某些操作或从中生成某些内容时,会特别触发。还可以触发清理或重组混乱的表格数据文件(格式错误的行、错误的标题、垃圾数据)到正确的电子表格中。可交付成果必须是电子表格文件。当主要交付成果是 Word 文档、HTML 报告、独立 Python 脚本、数据库管道或 Google Sheets API 集成时,即使涉及表格数据,也不要触发。
来源:内容改编自人类/技能(麻省理工学院)。
所有 Excel 文件
专业字体
- 除非用户另有指示,否则所有交付成果均使用一致的专业字体(例如 Arial、Times New Roman)
零公式错误
- 每个 Excel 模型都必须提供零公式错误(#REF!、#DIV/0!、#VALUE!、#N/A、#NAME?)
保留现有模板(更新模板时)
- 修改文件时研究并完全匹配现有格式、样式和约定
- 切勿对具有既定模式的文件强加标准化格式
- 现有的模板约定始终优先于这些准则
财务模型
颜色编码标准
除非用户或现有模板另有说明
行业标准颜色约定
- 蓝色文本(RGB:0,0,255):硬编码输入,用户将根据场景更改数字
- 黑色文本(RGB:0,0,0):所有公式和计算
- 绿色文本(RGB:0,128,0):从同一工作簿中的其他工作表中提取的链接
- 红色文本(RGB:255,0,0):其他文件的外部链接
- 黄色背景(RGB:255,255,0):需要注意的关键假设或需要更新的单元格
数字格式标准
所需格式规则
- 年:格式为文本字符串(例如“2024”而不是“2,024”)
- 货币:使用$#,##0格式;始终在标题中指定单位(“收入 ($mm)”)
- 零:使用数字格式将所有零设为“-”,包括百分比(例如“$#,##0;($#,##0);-”)
- 百分比:默认为 0.0% 格式(一位小数)
- 倍数:估值倍数格式为 0.0x(EV/EBITDA、P/E)
- 负数:使用括号 (123),而不是负 -123
公式构建规则
假设放置
- 将所有假设(增长率、利润率、倍数等)放在单独的假设单元格中
- 在公式中使用单元格引用而不是硬编码值
- 示例:使用 =B5*(1+$B$6) 代替 =B5*1.05
公式错误预防
- 验证所有单元格引用是否正确
- 检查范围内的差一错误
- 确保所有预测期间的公式一致
- 使用边缘情况进行测试(零值、负数)
- 验证没有意外的循环引用
硬编码的文档要求
- 评论或在旁边的单元格中(如果在表末尾)。格式:“来源:[系统/文档]、[日期]、[具体参考]、[URL(如果适用)”
- 示例:
- “资料来源:公司 10-K,2024 财年,第 45 页,收入说明,[SEC EDGAR URL]”
- “资料来源:公司 10-Q,2025 年第 2 季度,图表 99.1,[SEC EDGAR URL]”
- “资料来源:彭博终端,2025 年 8 月 15 日,AAPL 美国股票”
- “来源:FactSet,2025 年 8 月 20 日,共识估计屏幕”
XLSX 创建、编辑和分析
概述
用户可能会要求您创建、编辑或分析.xlsx 文件的内容。您有不同的工具和工作流程可用于不同的任务。
重要要求
公式重新计算需要 LibreOffice:您可以假设已安装 LibreOffice 以使用scripts/recalc.py脚本重新计算公式值。该脚本会在首次运行时自动配置 LibreOffice,包括在 Unix 套接字受到限制的沙盒环境中(由scripts/office/soffice.py处理)
读取和分析数据
使用 pandas 进行数据分析
对于数据分析、可视化和基本操作,可以使用pandas,它提供了强大的数据操作能力:
import pandas as pd
# Read Excel
df = pd.read_excel('file.xlsx') # Default: first sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dict
# Analyze
df.head() # Preview data
df.info() # Column info
df.describe() # Statistics
# Write Excel
df.to_excel('output.xlsx', index=False)Excel 文件工作流程
关键:使用公式,而不是硬编码值
始终使用 Excel 公式,而不是在 Python 中计算值并对它们进行硬编码。 这可确保电子表格保持动态和可更新。
错误 - 硬编码计算值
# Bad: Calculating in Python and hardcoding result
total = df['Sales'].sum()
sheet['B10'] = total # Hardcodes 5000
# Bad: Computing growth rate in Python
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
sheet['C5'] = growth # Hardcodes 0.15
# Bad: Python calculation for average
avg = sum(values) / len(values)
sheet['D20'] = avg # Hardcodes 42.5正确 - 使用 Excel 公式
# Good: Let Excel calculate the sum
sheet['B10'] = '=SUM(B2:B9)'
# Good: Growth rate as Excel formula
sheet['C5'] = '=(C4-C2)/C2'
# Good: Average using Excel function
sheet['D20'] = '=AVERAGE(D2:D19)'这适用于所有计算 - 总计、百分比、比率、差异等。当源数据发生变化时,电子表格应该能够重新计算。
通用工作流程
- 选择工具:pandas 用于数据,openpyxl 用于公式/格式设置
- 创建/加载:创建新工作簿或加载现有文件
- 修改:添加/编辑数据、公式和格式
- 保存:写入文件
- 重新计算公式(如果使用公式则必须):使用scripts/recalc.py 脚本
python scripts/recalc.py output.xlsx - 验证并修复任何错误:
- 该脚本返回带有错误详细信息的 JSON
- 如果
status是errors_found,检查error_summary具体错误类型和位置 - 修复发现的错误并再次重新计算
- 需要修复的常见错误:
#REF!:无效的单元格引用#DIV/0!:除以零#VALUE!:公式中的数据类型错误#NAME?:无法识别的公式名称
创建新的 Excel 文件
# Using openpyxl for formulas and formatting
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
wb = Workbook()
sheet = wb.active
# Add data
sheet['A1'] = 'Hello'
sheet['B1'] = 'World'
sheet.append(['Row', 'of', 'data'])
# Add formula
sheet['B2'] = '=SUM(A1:A10)'
# Formatting
sheet['A1'].font = Font(bold=True, color='FF0000')
sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
sheet['A1'].alignment = Alignment(horizontal='center')
# Column width
sheet.column_dimensions['A'].width = 20
wb.save('output.xlsx')编辑现有 Excel 文件
# Using openpyxl to preserve formulas and formatting
from openpyxl import load_workbook
# Load existing file
wb = load_workbook('existing.xlsx')
sheet = wb.active # or wb['SheetName'] for specific sheet
# Working with multiple sheets
for sheet_name in wb.sheetnames:
sheet = wb[sheet_name]
print(f"Sheet: {sheet_name}")
# Modify cells
sheet['A1'] = 'New Value'
sheet.insert_rows(2) # Insert row at position 2
sheet.delete_cols(3) # Delete column 3
# Add new sheet
new_sheet = wb.create_sheet('NewSheet')
new_sheet['A1'] = 'Data'
wb.save('modified.xlsx')重新计算公式
由 openpyxl 创建或修改的 Excel 文件包含字符串形式的公式,但不包含计算值。使用提供的scripts/recalc.py脚本重新计算公式:
python scripts/recalc.py <excel_file> [timeout_seconds]例子:
python scripts/recalc.py output.xlsx 30脚本:
- 首次运行时自动设置 LibreOffice 宏
- 重新计算所有工作表中的所有公式
- 扫描所有单元格是否存在 Excel 错误(#REF!、#DIV/0! 等)
- 返回包含详细错误位置和计数的 JSON
- 适用于 Linux 和 macOS
配方验证清单
快速检查以确保公式正确运行:
基本验证
- 测试 2-3 个示例参考:在构建完整模型之前验证它们提取正确的值
- 列映射:确认 Excel 列匹配(例如,第 64 列 = BL,而不是 BK)
- 行偏移:记住 Excel 行是 1 索引的(DataFrame 第 5 行 = Excel 第 6 行)
常见陷阱
- NaN 处理:使用
pd.notna()检查空值 - 最右列:FY 数据通常位于第 50 列以上
- 多个匹配:搜索所有出现的情况,而不仅仅是第一个
- 除以零:在公式中使用
/之前检查分母 (#DIV/0!) - 错误引用:验证所有单元格引用都指向预期单元格(#REF!)
- 跨工作表引用:使用正确的格式 (Sheet1!A1) 来链接工作表
配方测试策略
- 从小处开始:在广泛应用之前在 2-3 个单元格上测试公式
- 验证依赖关系:检查公式中引用的所有单元格是否存在
- 测试边缘情况:包括零、负和非常大的值
解释 script/recalc.py 输出
该脚本返回带有错误详细信息的 JSON:
{
"status": "success", // or "errors_found"
"total_errors": 0, // Total error count
"total_formulas": 42, // Number of formulas in file
"error_summary": { // Only present if errors found
"#REF!": {
"count": 2,
"locations": ["Sheet1!B5", "Sheet1!C10"]
}
}
}最佳实践
库选择
- pandas:最适合数据分析、批量操作和简单数据导出
- openpyxl:最适合复杂的格式设置、公式和 Excel 特定功能
使用 openpyxl
- 单元格索引从 1 开始(行=1,列=1 指单元格 A1)
- 使用
data_only=True读取计算值:load_workbook('file.xlsx', data_only=True) - 警告:如果使用
data_only=True打开并保存,公式将被值替换并永久丢失 - 对于大文件:读取时使用
read_only=True或写入时使用write_only=True - 公式被保留但不被评估 - 使用scripts/recalc.py 更新值
与熊猫一起工作
- 指定数据类型以避免推理问题:
pd.read_excel('file.xlsx', dtype={'id': str}) - 对于大文件,请阅读特定列:
pd.read_excel('file.xlsx', usecols=['A', 'C', 'E']) - 正确处理日期:
pd.read_excel('file.xlsx', parse_dates=['date_column'])
代码风格指南
重要:为 Excel 操作生成 Python 代码时:
- 编写最少、简洁的 Python 代码,没有不必要的注释
- 避免冗长的变量名称和冗余操作
- 避免不必要的打印语句
对于 Excel 文件本身:
- 向具有复杂公式或重要假设的单元格添加注释
- 记录硬编码值的数据源
- 包括关键计算和模型部分的注释
资源文件
许可证.txt
二进制资源
脚本/office/helpers/init.py
二进制资源
脚本/office/helpers/merge_runs.py
下载脚本/office/helpers/merge_runs.py
"""Merge adjacent runs with identical formatting in DOCX.
Merges adjacent <w:r> elements that have identical <w:rPr> properties.
Works on runs in paragraphs and inside tracked changes (<w:ins>, <w:del>).
Also:
- Removes rsid attributes from runs (revision metadata that doesn't affect rendering)
- Removes proofErr elements (spell/grammar markers that block merging)
"""
from pathlib import Path
import defusedxml.minidom
def merge_runs(input_dir: str) -> tuple[int, str]:
doc_xml = Path(input_dir) / "word" / "document.xml"
if not doc_xml.exists():
return 0, f"Error: {doc_xml} not found"
try:
dom = defusedxml.minidom.parseString(doc_xml.read_text(encoding="utf-8"))
root = dom.documentElement
_remove_elements(root, "proofErr")
_strip_run_rsid_attrs(root)
containers = {run.parentNode for run in _find_elements(root, "r")}
merge_count = 0
for container in containers:
merge_count += _merge_runs_in(container)
doc_xml.write_bytes(dom.toxml(encoding="UTF-8"))
return merge_count, f"Merged {merge_count} runs"
except Exception as e:
return 0, f"Error: {e}"
def _find_elements(root, tag: str) -> list:
results = []
def traverse(node):
if node.nodeType == node.ELEMENT_NODE:
name = node.localName or node.tagName
if name == tag or name.endswith(f":{tag}"):
results.append(node)
for child in node.childNodes:
traverse(child)
traverse(root)
return results
def _get_child(parent, tag: str):
for child in parent.childNodes:
if child.nodeType == child.ELEMENT_NODE:
name = child.localName or child.tagName
if name == tag or name.endswith(f":{tag}"):
return child
return None
def _get_children(parent, tag: str) -> list:
results = []
for child in parent.childNodes:
if child.nodeType == child.ELEMENT_NODE:
name = child.localName or child.tagName
if name == tag or name.endswith(f":{tag}"):
results.append(child)
return results
def _is_adjacent(elem1, elem2) -> bool:
node = elem1.nextSibling
while node:
if node == elem2:
return True
if node.nodeType == node.ELEMENT_NODE:
return False
if node.nodeType == node.TEXT_NODE and node.data.strip():
return False
node = node.nextSibling
return False
def _remove_elements(root, tag: str):
for elem in _find_elements(root, tag):
if elem.parentNode:
elem.parentNode.removeChild(elem)
def _strip_run_rsid_attrs(root):
for run in _find_elements(root, "r"):
for attr in list(run.attributes.values()):
if "rsid" in attr.name.lower():
run.removeAttribute(attr.name)
def _merge_runs_in(container) -> int:
merge_count = 0
run = _first_child_run(container)
while run:
while True:
next_elem = _next_element_sibling(run)
if next_elem and _is_run(next_elem) and _can_merge(run, next_elem):
_merge_run_content(run, next_elem)
container.removeChild(next_elem)
merge_count += 1
else:
break
_consolidate_text(run)
run = _next_sibling_run(run)
return merge_count
def _first_child_run(container):
for child in container.childNodes:
if child.nodeType == child.ELEMENT_NODE and _is_run(child):
return child
return None
def _next_element_sibling(node):
sibling = node.nextSibling
while sibling:
if sibling.nodeType == sibling.ELEMENT_NODE:
return sibling
sibling = sibling.nextSibling
return None
def _next_sibling_run(node):
sibling = node.nextSibling
while sibling:
if sibling.nodeType == sibling.ELEMENT_NODE:
if _is_run(sibling):
return sibling
sibling = sibling.nextSibling
return None
def _is_run(node) -> bool:
name = node.localName or node.tagName
return name == "r" or name.endswith(":r")
def _can_merge(run1, run2) -> bool:
rpr1 = _get_child(run1, "rPr")
rpr2 = _get_child(run2, "rPr")
if (rpr1 is None) != (rpr2 is None):
return False
if rpr1 is None:
return True
return rpr1.toxml() == rpr2.toxml()
def _merge_run_content(target, source):
for child in list(source.childNodes):
if child.nodeType == child.ELEMENT_NODE:
name = child.localName or child.tagName
if name != "rPr" and not name.endswith(":rPr"):
target.appendChild(child)
def _consolidate_text(run):
t_elements = _get_children(run, "t")
for i in range(len(t_elements) - 1, 0, -1):
curr, prev = t_elements[i], t_elements[i - 1]
if _is_adjacent(prev, curr):
prev_text = prev.firstChild.data if prev.firstChild else ""
curr_text = curr.firstChild.data if curr.firstChild else ""
merged = prev_text + curr_text
if prev.firstChild:
prev.firstChild.data = merged
else:
prev.appendChild(run.ownerDocument.createTextNode(merged))
if merged.startswith(" ") or merged.endswith(" "):
prev.setAttribute("xml:space", "preserve")
elif prev.hasAttribute("xml:space"):
prev.removeAttribute("xml:space")
run.removeChild(curr)脚本/office/helpers/simplify_redlines.py
下载脚本/office/helpers/simplify_redlines.py
"""Simplify tracked changes by merging adjacent w:ins or w:del elements.
Merges adjacent <w:ins> elements from the same author into a single element.
Same for <w:del> elements. This makes heavily-redlined documents easier to
work with by reducing the number of tracked change wrappers.
Rules:
- Only merges w:ins with w:ins, w:del with w:del (same element type)
- Only merges if same author (ignores timestamp differences)
- Only merges if truly adjacent (only whitespace between them)
"""
import xml.etree.ElementTree as ET
import zipfile
from pathlib import Path
import defusedxml.minidom
WORD_NS = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
def simplify_redlines(input_dir: str) -> tuple[int, str]:
doc_xml = Path(input_dir) / "word" / "document.xml"
if not doc_xml.exists():
return 0, f"Error: {doc_xml} not found"
try:
dom = defusedxml.minidom.parseString(doc_xml.read_text(encoding="utf-8"))
root = dom.documentElement
merge_count = 0
containers = _find_elements(root, "p") + _find_elements(root, "tc")
for container in containers:
merge_count += _merge_tracked_changes_in(container, "ins")
merge_count += _merge_tracked_changes_in(container, "del")
doc_xml.write_bytes(dom.toxml(encoding="UTF-8"))
return merge_count, f"Simplified {merge_count} tracked changes"
except Exception as e:
return 0, f"Error: {e}"
def _merge_tracked_changes_in(container, tag: str) -> int:
merge_count = 0
tracked = [
child
for child in container.childNodes
if child.nodeType == child.ELEMENT_NODE and _is_element(child, tag)
]
if len(tracked) < 2:
return 0
i = 0
while i < len(tracked) - 1:
curr = tracked[i]
next_elem = tracked[i + 1]
if _can_merge_tracked(curr, next_elem):
_merge_tracked_content(curr, next_elem)
container.removeChild(next_elem)
tracked.pop(i + 1)
merge_count += 1
else:
i += 1
return merge_count
def _is_element(node, tag: str) -> bool:
name = node.localName or node.tagName
return name == tag or name.endswith(f":{tag}")
def _get_author(elem) -> str:
author = elem.getAttribute("w:author")
if not author:
for attr in elem.attributes.values():
if attr.localName == "author" or attr.name.endswith(":author"):
return attr.value
return author
def _can_merge_tracked(elem1, elem2) -> bool:
if _get_author(elem1) != _get_author(elem2):
return False
node = elem1.nextSibling
while node and node != elem2:
if node.nodeType == node.ELEMENT_NODE:
return False
if node.nodeType == node.TEXT_NODE and node.data.strip():
return False
node = node.nextSibling
return True
def _merge_tracked_content(target, source):
while source.firstChild:
child = source.firstChild
source.removeChild(child)
target.appendChild(child)
def _find_elements(root, tag: str) -> list:
results = []
def traverse(node):
if node.nodeType == node.ELEMENT_NODE:
name = node.localName or node.tagName
if name == tag or name.endswith(f":{tag}"):
results.append(node)
for child in node.childNodes:
traverse(child)
traverse(root)
return results
def get_tracked_change_authors(doc_xml_path: Path) -> dict[str, int]:
if not doc_xml_path.exists():
return {}
try:
tree = ET.parse(doc_xml_path)
root = tree.getroot()
except ET.ParseError:
return {}
namespaces = {"w": WORD_NS}
author_attr = f"{{{WORD_NS}}}author"
authors: dict[str, int] = {}
for tag in ["ins", "del"]:
for elem in root.findall(f".//w:{tag}", namespaces):
author = elem.get(author_attr)
if author:
authors[author] = authors.get(author, 0) + 1
return authors
def _get_authors_from_docx(docx_path: Path) -> dict[str, int]:
try:
with zipfile.ZipFile(docx_path, "r") as zf:
if "word/document.xml" not in zf.namelist():
return {}
with zf.open("word/document.xml") as f:
tree = ET.parse(f)
root = tree.getroot()
namespaces = {"w": WORD_NS}
author_attr = f"{{{WORD_NS}}}author"
authors: dict[str, int] = {}
for tag in ["ins", "del"]:
for elem in root.findall(f".//w:{tag}", namespaces):
author = elem.get(author_attr)
if author:
authors[author] = authors.get(author, 0) + 1
return authors
except (zipfile.BadZipFile, ET.ParseError):
return {}
def infer_author(modified_dir: Path, original_docx: Path, default: str = "Claude") -> str:
modified_xml = modified_dir / "word" / "document.xml"
modified_authors = get_tracked_change_authors(modified_xml)
if not modified_authors:
return default
original_authors = _get_authors_from_docx(original_docx)
new_changes: dict[str, int] = {}
for author, count in modified_authors.items():
original_count = original_authors.get(author, 0)
diff = count - original_count
if diff > 0:
new_changes[author] = diff
if not new_changes:
return default
if len(new_changes) == 1:
return next(iter(new_changes))
raise ValueError(
f"Multiple authors added new changes: {new_changes}. "
"Cannot infer which author to validate."
)脚本/office/pack.py
"""Pack a directory into a DOCX, PPTX, or XLSX file.
Validates with auto-repair, condenses XML formatting, and creates the Office file.
Usage:
python pack.py <input_directory> <output_file> [--original <file>] [--validate true|false]
Examples:
python pack.py unpacked/ output.docx --original input.docx
python pack.py unpacked/ output.pptx --validate false
"""
import argparse
import sys
import shutil
import tempfile
import zipfile
from pathlib import Path
import defusedxml.minidom
from validators import DOCXSchemaValidator, PPTXSchemaValidator, RedliningValidator
def pack(
input_directory: str,
output_file: str,
original_file: str | None = None,
validate: bool = True,
infer_author_func=None,
) -> tuple[None, str]:
input_dir = Path(input_directory)
output_path = Path(output_file)
suffix = output_path.suffix.lower()
if not input_dir.is_dir():
return None, f"Error: {input_dir} is not a directory"
if suffix not in {".docx", ".pptx", ".xlsx"}:
return None, f"Error: {output_file} must be a .docx, .pptx, or .xlsx file"
if validate and original_file:
original_path = Path(original_file)
if original_path.exists():
success, output = _run_validation(
input_dir, original_path, suffix, infer_author_func
)
if output:
print(output)
if not success:
return None, f"Error: Validation failed for {input_dir}"
with tempfile.TemporaryDirectory() as temp_dir:
temp_content_dir = Path(temp_dir) / "content"
shutil.copytree(input_dir, temp_content_dir)
for pattern in ["*.xml", "*.rels"]:
for xml_file in temp_content_dir.rglob(pattern):
_condense_xml(xml_file)
output_path.parent.mkdir(parents=True, exist_ok=True)
with zipfile.ZipFile(output_path, "w", zipfile.ZIP_DEFLATED) as zf:
for f in temp_content_dir.rglob("*"):
if f.is_file():
zf.write(f, f.relative_to(temp_content_dir))
return None, f"Successfully packed {input_dir} to {output_file}"
def _run_validation(
unpacked_dir: Path,
original_file: Path,
suffix: str,
infer_author_func=None,
) -> tuple[bool, str | None]:
output_lines = []
validators = []
if suffix == ".docx":
author = "Claude"
if infer_author_func:
try:
author = infer_author_func(unpacked_dir, original_file)
except ValueError as e:
print(f"Warning: {e} Using default author 'Claude'.", file=sys.stderr)
validators = [
DOCXSchemaValidator(unpacked_dir, original_file),
RedliningValidator(unpacked_dir, original_file, author=author),
]
elif suffix == ".pptx":
validators = [PPTXSchemaValidator(unpacked_dir, original_file)]
if not validators:
return True, None
total_repairs = sum(v.repair() for v in validators)
if total_repairs:
output_lines.append(f"Auto-repaired {total_repairs} issue(s)")
success = all(v.validate() for v in validators)
if success:
output_lines.append("All validations PASSED!")
return success, "\n".join(output_lines) if output_lines else None
def _condense_xml(xml_file: Path) -> None:
try:
with open(xml_file, encoding="utf-8") as f:
dom = defusedxml.minidom.parse(f)
for element in dom.getElementsByTagName("*"):
if element.tagName.endswith(":t"):
continue
for child in list(element.childNodes):
if (
child.nodeType == child.TEXT_NODE
and child.nodeValue
and child.nodeValue.strip() == ""
) or child.nodeType == child.COMMENT_NODE:
element.removeChild(child)
xml_file.write_bytes(dom.toxml(encoding="UTF-8"))
except Exception as e:
print(f"ERROR: Failed to parse {xml_file.name}: {e}", file=sys.stderr)
raise
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Pack a directory into a DOCX, PPTX, or XLSX file"
)
parser.add_argument("input_directory", help="Unpacked Office document directory")
parser.add_argument("output_file", help="Output Office file (.docx/.pptx/.xlsx)")
parser.add_argument(
"--original",
help="Original file for validation comparison",
)
parser.add_argument(
"--validate",
type=lambda x: x.lower() == "true",
default=True,
metavar="true|false",
help="Run validation with auto-repair (default: true)",
)
args = parser.parse_args()
_, message = pack(
args.input_directory,
args.output_file,
original_file=args.original,
validate=args.validate,
)
print(message)
if "Error" in message:
sys.exit(1)脚本/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/pml.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/pml.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/sml.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/sml.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/wml.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/wml.xsd
二进制资源
脚本/office/schemas/ISO-IEC29500-4_2016/xml.xsd
下载脚本/office/schemas/ISO-IEC29500-4_2016/xml.xsd
二进制资源
脚本/office/schemas/ecma/第四版/opc-contentTypes.xsd
下载脚本/office/schemas/ecma/fouth-edition/opc-contentTypes.xsd
二进制资源
脚本/office/schemas/ecma/第四版/opc-coreProperties.xsd
下载脚本/office/schemas/ecma/fouth-edition/opc-coreProperties.xsd
二进制资源
脚本/office/schemas/ecma/第四版/opc-digSig.xsd
下载脚本/office/schemas/ecma/fouth-edition/opc-digSig.xsd
二进制资源
脚本/office/schemas/ecma/第四版/opc-relationships.xsd
下载脚本/office/schemas/ecma/fouth-edition/opc-relationships.xsd
二进制资源
脚本/office/schemas/mce/mc.xsd
下载脚本/office/schemas/mce/mc.xsd
二进制资源
脚本/office/schemas/microsoft/wml-2010.xsd
下载脚本/office/schemas/microsoft/wml-2010.xsd
二进制资源
脚本/office/schemas/microsoft/wml-2012.xsd
下载脚本/office/schemas/microsoft/wml-2012.xsd
二进制资源
脚本/office/schemas/microsoft/wml-2018.xsd
下载脚本/office/schemas/microsoft/wml-2018.xsd
二进制资源
脚本/office/schemas/microsoft/wml-cex-2018.xsd
下载脚本/office/schemas/microsoft/wml-cex-2018.xsd
二进制资源
脚本/office/schemas/microsoft/wml-cid-2016.xsd
下载脚本/office/schemas/microsoft/wml-cid-2016.xsd
二进制资源
脚本/office/schemas/microsoft/wml-sdtdatahash-2020.xsd
下载脚本/office/schemas/microsoft/wml-sdtdatahash-2020.xsd
二进制资源
脚本/office/schemas/microsoft/wml-symex-2015.xsd
下载脚本/office/schemas/microsoft/wml-symex-2015.xsd
二进制资源
脚本/office/soffice.py
"""
Helper for running LibreOffice (soffice) in environments where AF_UNIX
sockets may be blocked (e.g., sandboxed VMs). Detects the restriction
at runtime and applies an LD_PRELOAD shim if needed.
Usage:
from office.soffice import run_soffice, get_soffice_env
# Option 1 – run soffice directly
result = run_soffice(["--headless", "--convert-to", "pdf", "input.docx"])
# Option 2 – get env dict for your own subprocess calls
env = get_soffice_env()
subprocess.run(["soffice", ...], env=env)
"""
import os
import socket
import subprocess
import tempfile
from pathlib import Path
def get_soffice_env() -> dict:
env = os.environ.copy()
env["SAL_USE_VCLPLUGIN"] = "svp"
if _needs_shim():
shim = _ensure_shim()
env["LD_PRELOAD"] = str(shim)
return env
def run_soffice(args: list[str], **kwargs) -> subprocess.CompletedProcess:
env = get_soffice_env()
return subprocess.run(["soffice"] + args, env=env, **kwargs)
_SHIM_SO = Path(tempfile.gettempdir()) / "lo_socket_shim.so"
def _needs_shim() -> bool:
try:
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.close()
return False
except OSError:
return True
def _ensure_shim() -> Path:
if _SHIM_SO.exists():
return _SHIM_SO
src = Path(tempfile.gettempdir()) / "lo_socket_shim.c"
src.write_text(_SHIM_SOURCE)
subprocess.run(
["gcc", "-shared", "-fPIC", "-o", str(_SHIM_SO), str(src), "-ldl"],
check=True,
capture_output=True,
)
src.unlink()
return _SHIM_SO
_SHIM_SOURCE = r"""
#define _GNU_SOURCE
#include <dlfcn.h>
#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <unistd.h>
static int (*real_socket)(int, int, int);
static int (*real_socketpair)(int, int, int, int[2]);
static int (*real_listen)(int, int);
static int (*real_accept)(int, struct sockaddr *, socklen_t *);
static int (*real_close)(int);
static int (*real_read)(int, void *, size_t);
/* Per-FD bookkeeping (FDs >= 1024 are passed through unshimmed). */
static int is_shimmed[1024];
static int peer_of[1024];
static int wake_r[1024]; /* accept() blocks reading this */
static int wake_w[1024]; /* close() writes to this */
static int listener_fd = -1; /* FD that received listen() */
__attribute__((constructor))
static void init(void) {
real_socket = dlsym(RTLD_NEXT, "socket");
real_socketpair = dlsym(RTLD_NEXT, "socketpair");
real_listen = dlsym(RTLD_NEXT, "listen");
real_accept = dlsym(RTLD_NEXT, "accept");
real_close = dlsym(RTLD_NEXT, "close");
real_read = dlsym(RTLD_NEXT, "read");
for (int i = 0; i < 1024; i++) {
peer_of[i] = -1;
wake_r[i] = -1;
wake_w[i] = -1;
}
}
/* ---- socket ---------------------------------------------------------- */
int socket(int domain, int type, int protocol) {
if (domain == AF_UNIX) {
int fd = real_socket(domain, type, protocol);
if (fd >= 0) return fd;
/* socket(AF_UNIX) blocked – fall back to socketpair(). */
int sv[2];
if (real_socketpair(domain, type, protocol, sv) == 0) {
if (sv[0] >= 0 && sv[0] < 1024) {
is_shimmed[sv[0]] = 1;
peer_of[sv[0]] = sv[1];
int wp[2];
if (pipe(wp) == 0) {
wake_r[sv[0]] = wp[0];
wake_w[sv[0]] = wp[1];
}
}
return sv[0];
}
errno = EPERM;
return -1;
}
return real_socket(domain, type, protocol);
}
/* ---- listen ---------------------------------------------------------- */
int listen(int sockfd, int backlog) {
if (sockfd >= 0 && sockfd < 1024 && is_shimmed[sockfd]) {
listener_fd = sockfd;
return 0;
}
return real_listen(sockfd, backlog);
}
/* ---- accept ---------------------------------------------------------- */
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen) {
if (sockfd >= 0 && sockfd < 1024 && is_shimmed[sockfd]) {
/* Block until close() writes to the wake pipe. */
if (wake_r[sockfd] >= 0) {
char buf;
real_read(wake_r[sockfd], &buf, 1);
}
errno = ECONNABORTED;
return -1;
}
return real_accept(sockfd, addr, addrlen);
}
/* ---- close ----------------------------------------------------------- */
int close(int fd) {
if (fd >= 0 && fd < 1024 && is_shimmed[fd]) {
int was_listener = (fd == listener_fd);
is_shimmed[fd] = 0;
if (wake_w[fd] >= 0) { /* unblock accept() */
char c = 0;
write(wake_w[fd], &c, 1);
real_close(wake_w[fd]);
wake_w[fd] = -1;
}
if (wake_r[fd] >= 0) { real_close(wake_r[fd]); wake_r[fd] = -1; }
if (peer_of[fd] >= 0) { real_close(peer_of[fd]); peer_of[fd] = -1; }
if (was_listener)
_exit(0); /* conversion done – exit */
}
return real_close(fd);
}
"""
if __name__ == "__main__":
import sys
result = run_soffice(sys.argv[1:])
sys.exit(result.returncode)脚本/office/unpack.py
"""Unpack Office files (DOCX, PPTX, XLSX) for editing.
Extracts the ZIP archive, pretty-prints XML files, and optionally:
- Merges adjacent runs with identical formatting (DOCX only)
- Simplifies adjacent tracked changes from same author (DOCX only)
Usage:
python unpack.py <office_file> <output_dir> [options]
Examples:
python unpack.py document.docx unpacked/
python unpack.py presentation.pptx unpacked/
python unpack.py document.docx unpacked/ --merge-runs false
"""
import argparse
import sys
import zipfile
from pathlib import Path
import defusedxml.minidom
from helpers.merge_runs import merge_runs as do_merge_runs
from helpers.simplify_redlines import simplify_redlines as do_simplify_redlines
SMART_QUOTE_REPLACEMENTS = {
"\u201c": "“",
"\u201d": "”",
"\u2018": "‘",
"\u2019": "’",
}
def unpack(
input_file: str,
output_directory: str,
merge_runs: bool = True,
simplify_redlines: bool = True,
) -> tuple[None, str]:
input_path = Path(input_file)
output_path = Path(output_directory)
suffix = input_path.suffix.lower()
if not input_path.exists():
return None, f"Error: {input_file} does not exist"
if suffix not in {".docx", ".pptx", ".xlsx"}:
return None, f"Error: {input_file} must be a .docx, .pptx, or .xlsx file"
try:
output_path.mkdir(parents=True, exist_ok=True)
with zipfile.ZipFile(input_path, "r") as zf:
zf.extractall(output_path)
xml_files = list(output_path.rglob("*.xml")) + list(output_path.rglob("*.rels"))
for xml_file in xml_files:
_pretty_print_xml(xml_file)
message = f"Unpacked {input_file} ({len(xml_files)} XML files)"
if suffix == ".docx":
if simplify_redlines:
simplify_count, _ = do_simplify_redlines(str(output_path))
message += f", simplified {simplify_count} tracked changes"
if merge_runs:
merge_count, _ = do_merge_runs(str(output_path))
message += f", merged {merge_count} runs"
for xml_file in xml_files:
_escape_smart_quotes(xml_file)
return None, message
except zipfile.BadZipFile:
return None, f"Error: {input_file} is not a valid Office file"
except Exception as e:
return None, f"Error unpacking: {e}"
def _pretty_print_xml(xml_file: Path) -> None:
try:
content = xml_file.read_text(encoding="utf-8")
dom = defusedxml.minidom.parseString(content)
xml_file.write_bytes(dom.toprettyxml(indent=" ", encoding="utf-8"))
except Exception:
pass
def _escape_smart_quotes(xml_file: Path) -> None:
try:
content = xml_file.read_text(encoding="utf-8")
for char, entity in SMART_QUOTE_REPLACEMENTS.items():
content = content.replace(char, entity)
xml_file.write_text(content, encoding="utf-8")
except Exception:
pass
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Unpack an Office file (DOCX, PPTX, XLSX) for editing"
)
parser.add_argument("input_file", help="Office file to unpack")
parser.add_argument("output_directory", help="Output directory")
parser.add_argument(
"--merge-runs",
type=lambda x: x.lower() == "true",
default=True,
metavar="true|false",
help="Merge adjacent runs with identical formatting (DOCX only, default: true)",
)
parser.add_argument(
"--simplify-redlines",
type=lambda x: x.lower() == "true",
default=True,
metavar="true|false",
help="Merge adjacent tracked changes from same author (DOCX only, default: true)",
)
args = parser.parse_args()
_, message = unpack(
args.input_file,
args.output_directory,
merge_runs=args.merge_runs,
simplify_redlines=args.simplify_redlines,
)
print(message)
if "Error" in message:
sys.exit(1)脚本/office/validate.py
"""
Command line tool to validate Office document XML files against XSD schemas and tracked changes.
Usage:
python validate.py <path> [--original <original_file>] [--auto-repair] [--author NAME]
The first argument can be either:
- An unpacked directory containing the Office document XML files
- A packed Office file (.docx/.pptx/.xlsx) which will be unpacked to a temp directory
Auto-repair fixes:
- paraId/durableId values that exceed OOXML limits
- Missing xml:space="preserve" on w:t elements with whitespace
"""
import argparse
import sys
import tempfile
import zipfile
from pathlib import Path
from validators import DOCXSchemaValidator, PPTXSchemaValidator, RedliningValidator
def main():
parser = argparse.ArgumentParser(description="Validate Office document XML files")
parser.add_argument(
"path",
help="Path to unpacked directory or packed Office file (.docx/.pptx/.xlsx)",
)
parser.add_argument(
"--original",
required=False,
default=None,
help="Path to original file (.docx/.pptx/.xlsx). If omitted, all XSD errors are reported and redlining validation is skipped.",
)
parser.add_argument(
"-v",
"--verbose",
action="store_true",
help="Enable verbose output",
)
parser.add_argument(
"--auto-repair",
action="store_true",
help="Automatically repair common issues (hex IDs, whitespace preservation)",
)
parser.add_argument(
"--author",
default="Claude",
help="Author name for redlining validation (default: Claude)",
)
args = parser.parse_args()
path = Path(args.path)
assert path.exists(), f"Error: {path} does not exist"
original_file = None
if args.original:
original_file = Path(args.original)
assert original_file.is_file(), f"Error: {original_file} is not a file"
assert original_file.suffix.lower() in [".docx", ".pptx", ".xlsx"], (
f"Error: {original_file} must be a .docx, .pptx, or .xlsx file"
)
file_extension = (original_file or path).suffix.lower()
assert file_extension in [".docx", ".pptx", ".xlsx"], (
f"Error: Cannot determine file type from {path}. Use --original or provide a .docx/.pptx/.xlsx file."
)
if path.is_file() and path.suffix.lower() in [".docx", ".pptx", ".xlsx"]:
temp_dir = tempfile.mkdtemp()
with zipfile.ZipFile(path, "r") as zf:
zf.extractall(temp_dir)
unpacked_dir = Path(temp_dir)
else:
assert path.is_dir(), f"Error: {path} is not a directory or Office file"
unpacked_dir = path
match file_extension:
case ".docx":
validators = [
DOCXSchemaValidator(unpacked_dir, original_file, verbose=args.verbose),
]
if original_file:
validators.append(
RedliningValidator(unpacked_dir, original_file, verbose=args.verbose, author=args.author)
)
case ".pptx":
validators = [
PPTXSchemaValidator(unpacked_dir, original_file, verbose=args.verbose),
]
case _:
print(f"Error: Validation not supported for file type {file_extension}")
sys.exit(1)
if args.auto_repair:
total_repairs = sum(v.repair() for v in validators)
if total_repairs:
print(f"Auto-repaired {total_repairs} issue(s)")
success = all(v.validate() for v in validators)
if success:
print("All validations PASSED!")
sys.exit(0 if success else 1)
if __name__ == "__main__":
main()脚本/office/validators/init.py
下载脚本/office/validators/init.py
"""
Validation modules for Word document processing.
"""
from .base import BaseSchemaValidator
from .docx import DOCXSchemaValidator
from .pptx import PPTXSchemaValidator
from .redlining import RedliningValidator
__all__ = [
"BaseSchemaValidator",
"DOCXSchemaValidator",
"PPTXSchemaValidator",
"RedliningValidator",
]脚本/office/validators/base.py
下载脚本/office/validators/base.py
二进制资源
脚本/office/validators/docx.py
下载脚本/office/validators/docx.py
二进制资源
脚本/office/validators/pptx.py
下载脚本/office/validators/pptx.py
二进制资源
脚本/office/validators/redlined.py
下载脚本/office/validators/redlined.py
二进制资源
脚本/recalc.py
"""
Excel Formula Recalculation Script
Recalculates all formulas in an Excel file using LibreOffice
"""
import json
import os
import platform
import subprocess
import sys
from pathlib import Path
from office.soffice import get_soffice_env
from openpyxl import load_workbook
MACRO_DIR_MACOS = "~/Library/Application Support/LibreOffice/4/user/basic/Standard"
MACRO_DIR_LINUX = "~/.config/libreoffice/4/user/basic/Standard"
MACRO_FILENAME = "Module1.xba"
RECALCULATE_MACRO = """<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE script:module PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "module.dtd">
<script:module xmlns:script="http://openoffice.org/2000/script" script:name="Module1" script:language="StarBasic">
Sub RecalculateAndSave()
ThisComponent.calculateAll()
ThisComponent.store()
ThisComponent.close(True)
End Sub
</script:module>"""
def has_gtimeout():
try:
subprocess.run(
["gtimeout", "--version"], capture_output=True, timeout=1, check=False
)
return True
except (FileNotFoundError, subprocess.TimeoutExpired):
return False
def setup_libreoffice_macro():
macro_dir = os.path.expanduser(
MACRO_DIR_MACOS if platform.system() == "Darwin" else MACRO_DIR_LINUX
)
macro_file = os.path.join(macro_dir, MACRO_FILENAME)
if (
os.path.exists(macro_file)
and "RecalculateAndSave" in Path(macro_file).read_text()
):
return True
if not os.path.exists(macro_dir):
subprocess.run(
["soffice", "--headless", "--terminate_after_init"],
capture_output=True,
timeout=10,
env=get_soffice_env(),
)
os.makedirs(macro_dir, exist_ok=True)
try:
Path(macro_file).write_text(RECALCULATE_MACRO)
return True
except Exception:
return False
def recalc(filename, timeout=30):
if not Path(filename).exists():
return {"error": f"File {filename} does not exist"}
abs_path = str(Path(filename).absolute())
if not setup_libreoffice_macro():
return {"error": "Failed to setup LibreOffice macro"}
cmd = [
"soffice",
"--headless",
"--norestore",
"vnd.sun.star.script:Standard.Module1.RecalculateAndSave?language=Basic&location=application",
abs_path,
]
if platform.system() == "Linux":
cmd = ["timeout", str(timeout)] + cmd
elif platform.system() == "Darwin" and has_gtimeout():
cmd = ["gtimeout", str(timeout)] + cmd
result = subprocess.run(cmd, capture_output=True, text=True, env=get_soffice_env())
if result.returncode != 0 and result.returncode != 124:
error_msg = result.stderr or "Unknown error during recalculation"
if "Module1" in error_msg or "RecalculateAndSave" not in error_msg:
return {"error": "LibreOffice macro not configured properly"}
return {"error": error_msg}
try:
wb = load_workbook(filename, data_only=True)
excel_errors = [
"#VALUE!",
"#DIV/0!",
"#REF!",
"#NAME?",
"#NULL!",
"#NUM!",
"#N/A",
]
error_details = {err: [] for err in excel_errors}
total_errors = 0
for sheet_name in wb.sheetnames:
ws = wb[sheet_name]
for row in ws.iter_rows():
for cell in row:
if cell.value is not None and isinstance(cell.value, str):
for err in excel_errors:
if err in cell.value:
location = f"{sheet_name}!{cell.coordinate}"
error_details[err].append(location)
total_errors += 1
break
wb.close()
result = {
"status": "success" if total_errors == 0 else "errors_found",
"total_errors": total_errors,
"error_summary": {},
}
for err_type, locations in error_details.items():
if locations:
result["error_summary"][err_type] = {
"count": len(locations),
"locations": locations[:20],
}
wb_formulas = load_workbook(filename, data_only=False)
formula_count = 0
for sheet_name in wb_formulas.sheetnames:
ws = wb_formulas[sheet_name]
for row in ws.iter_rows():
for cell in row:
if (
cell.value
and isinstance(cell.value, str)
and cell.value.startswith("=")
):
formula_count += 1
wb_formulas.close()
result["total_formulas"] = formula_count
return result
except Exception as e:
return {"error": str(e)}
def main():
if len(sys.argv) < 2:
print("Usage: python recalc.py <excel_file> [timeout_seconds]")
print("\nRecalculates all formulas in an Excel file using LibreOffice")
print("\nReturns JSON with error details:")
print(" - status: 'success' or 'errors_found'")
print(" - total_errors: Total number of Excel errors found")
print(" - total_formulas: Number of formulas in the file")
print(" - error_summary: Breakdown by error type with locations")
print(" - #VALUE!, #DIV/0!, #REF!, #NAME?, #NULL!, #NUM!, #N/A")
sys.exit(1)
filename = sys.argv[1]
timeout = int(sys.argv[2]) if len(sys.argv) > 2 else 30
result = recalc(filename, timeout)
print(json.dumps(result, indent=2))
if __name__ == "__main__":
main()参见 GitHub
PPTX
每当以任何方式涉及.pptx 文件时(作为输入、输出或两者),都可以使用此技能。这包括:创建幻灯片、宣传材料或演示文稿;从任何.pptx 文件中读取、解析或提取文本(即使提取的内容将在其他地方使用,例如在电子邮件或摘要中);编辑、修改或更新现有演示文稿;合并或拆分幻灯片文件;使用模板、布局、演讲者注释或评论。每当用户提及“甲板”、“幻灯片”、“演示文稿”或引用.pptx 文件名时触发,无论他们随后计划如何处理内容。如果需要打开、创建或触摸.pptx 文件,请使用此技能。
品牌指南
代理技能手册,用于组装品牌声音、视觉规则和可重复使用的资产,以便 Claude Skills 产生一致的创意成果。
claudeskills文档