Docx
Utilice esta habilidad siempre que el usuario quiera crear, leer, editar o manipular documentos de Word (archivos.docx). Los desencadenantes incluyen: cualquier mención de "documento de Word", "documento de Word", ".docx" o solicitudes para producir documentos profesionales con formatos como tablas de contenido, encabezados, números de página o membretes. Utilícelo también al extraer o reorganizar contenido de archivos.docx, insertar o reemplazar imágenes en documentos, buscar y reemplazar en archivos de Word, trabajar con seguimiento de cambios o comentarios o convertir contenido en un documento de Word pulido. Si el usuario solicita un "informe", "memorándum", "carta", "plantilla" o un entregable similar como un archivo Word o.docx, utilice esta habilidad. NO lo utilice para archivos PDF, hojas de cálculo, Google Docs ni tareas generales de codificación no relacionadas con la generación de documentos.
Fuente: Contenido adaptado de anthropics/skills (MIT).
Descripción general
Un archivo.docx es un archivo ZIP que contiene archivos XML.
Referencia rápida
| Tarea | Enfoque |
|---|---|
| Leer/analizar contenido | pandoco descomprimir para XML sin formato |
| Crear nuevo documento | Utilicedocx-js; consulte Creación de nuevos documentos a continuación |
| Editar documento existente | Desempaquetar -> editar XML -> volver a empaquetar - consulte Edición de documentos existentes a continuación |
Convirtiendo.doc a.docx
Los archivos.docheredados deben convertirse antes de editarlos:
python scripts/office/soffice.py --headless --convert-to docx document.docLeer contenido
# Text extraction with tracked changes
pandoc --track-changes=all document.docx -o output.md
# Raw XML access
python scripts/office/unpack.py document.docx unpacked/Convertir a imágenes
python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf pageAceptar cambios rastreados
Para producir un documento limpio con todos los cambios aceptados (requiere LibreOffice):
python scripts/accept_changes.py input.docx output.docxCrear nuevos documentos
Genere archivos.docx con JavaScript y luego valídelos. Instalar:npm install -g docx
Configuración
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
InternalHyperlink, Bookmark, FootnoteReferenceRun, PositionalTab,
PositionalTabAlignment, PositionalTabRelativeTo, PositionalTabLeader,
TabStopType, TabStopPosition, Column, SectionType,
TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
VerticalAlign, PageNumber, PageBreak } = require('docx');
const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));Validación
Después de crear el archivo, valídelo. Si la validación falla, descomprímalo, corrija el XML y vuelva a empaquetarlo.
python scripts/office/validate.py doc.docxTamaño de página
// CRITICAL: docx-js defaults to A4, not US Letter
// Always set page size explicitly for consistent results
sections: [{
properties: {
page: {
size: {
width: 12240, // 8.5 inches in DXA
height: 15840 // 11 inches in DXA
},
margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1 inch margins
}
},
children: [/* content */]
}]Tamaños de página comunes (unidades DXA, 1440 DXA = 1 pulgada):
| Papel | Ancho | Altura | Ancho del contenido (márgenes de 1") |
|---|---|---|---|
| Carta de EE. UU. | 12.240 | 15.840 | 9.360 |
| A4 (predeterminado) | 11.906 | 16.838 | 9.026 |
Orientación horizontal: docx-js intercambia ancho/alto internamente, así que pasa las dimensiones verticales y deja que él se encargue del intercambio:
size: {
width: 12240, // Pass SHORT edge as width
height: 15840, // Pass LONG edge as height
orientation: PageOrientation.LANDSCAPE // docx-js swaps them in the XML
},
// Content width = 15840 - left margin - right margin (uses the long edge)Estilos (anular encabezados integrados)
Utilice Arial como fuente predeterminada (universalmente compatible). Mantenga los títulos en negro para facilitar la lectura.
const doc = new Document({
styles: {
default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt default
paragraphStyles: [
// IMPORTANT: Use exact IDs to override built-in styles
{ id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 32, bold: true, font: "Arial" },
paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel required for TOC
{ id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 28, bold: true, font: "Arial" },
paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },
]
},
sections: [{
children: [
new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }),
]
}]
});Listas (NUNCA use viñetas Unicode)
// WRONG - never manually insert bullet characters
new Paragraph({ children: [new TextRun("* Item")] }) // BAD
new Paragraph({ children: [new TextRun("\u2022 Item")] }) // BAD
// CORRECT - use numbering config with LevelFormat.BULLET
const doc = new Document({
numbering: {
config: [
{ reference: "bullets",
levels: [{ level: 0, format: LevelFormat.BULLET, text: "*", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
{ reference: "numbers",
levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
]
},
sections: [{
children: [
new Paragraph({ numbering: { reference: "bullets", level: 0 },
children: [new TextRun("Bullet item")] }),
new Paragraph({ numbering: { reference: "numbers", level: 0 },
children: [new TextRun("Numbered item")] }),
]
}]
});
// Each reference creates INDEPENDENT numbering
// Same reference = continues (1,2,3 then 4,5,6)
// Different reference = restarts (1,2,3 then 1,2,3)Mesas
CRÍTICO: Las tablas necesitan anchos dobles: configurecolumnWidthsen la mesa Ywidthen cada celda. Sin ambos, las tablas se muestran incorrectamente en algunas plataformas.
// CRITICAL: Always set table width for consistent rendering
// CRITICAL: Use ShadingType.CLEAR (not SOLID) to prevent black backgrounds
const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
const borders = { top: border, bottom: border, left: border, right: border };
new Table({
width: { size: 9360, type: WidthType.DXA }, // Always use DXA (percentages break in Google Docs)
columnWidths: [4680, 4680], // Must sum to table width (DXA: 1440 = 1 inch)
rows: [
new TableRow({
children: [
new TableCell({
borders,
width: { size: 4680, type: WidthType.DXA }, // Also set on each cell
shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID
margins: { top: 80, bottom: 80, left: 120, right: 120 }, // Cell padding (internal, not added to width)
children: [new Paragraph({ children: [new TextRun("Cell")] })]
})
]
})
]
})Cálculo del ancho de la mesa:
Utilice siempre los saltosWidthType.DXA-WidthType.PERCENTAGEen Google Docs.
// Table width = sum of columnWidths = content width
// US Letter with 1" margins: 12240 - 2880 = 9360 DXA
width: { size: 9360, type: WidthType.DXA },
columnWidths: [7000, 2360] // Must sum to table widthReglas de ancho:
- Utilice siempre
WidthType.DXA- nuncaWidthType.PERCENTAGE(incompatible con Google Docs) - El ancho de la tabla debe ser igual a la suma de
columnWidths - La celda
widthdebe coincidir con lacolumnWidthcorrespondiente - Las celdas
marginstienen relleno interno: reducen el área de contenido, no aumentan el ancho de la celda - Para tablas de ancho completo: utilice el ancho del contenido (ancho de página menos los márgenes izquierdo y derecho)
Imágenes
// CRITICAL: type parameter is REQUIRED
new Paragraph({
children: [new ImageRun({
type: "png", // Required: png, jpg, jpeg, gif, bmp, svg
data: fs.readFileSync("image.png"),
transformation: { width: 200, height: 150 },
altText: { title: "Title", description: "Desc", name: "Name" } // All three required
})]
})Saltos de página
// CRITICAL: PageBreak must be inside a Paragraph
new Paragraph({ children: [new PageBreak()] })
// Or use pageBreakBefore
new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })Hipervínculos
// External link
new Paragraph({
children: [new ExternalHyperlink({
children: [new TextRun({ text: "Click here", style: "Hyperlink" })],
link: "https://example.com",
})]
})
// Internal link (bookmark + reference)
// 1. Create bookmark at destination
new Paragraph({ heading: HeadingLevel.HEADING_1, children: [
new Bookmark({ id: "chapter1", children: [new TextRun("Chapter 1")] }),
]})
// 2. Link to it
new Paragraph({ children: [new InternalHyperlink({
children: [new TextRun({ text: "See Chapter 1", style: "Hyperlink" })],
anchor: "chapter1",
})]})Notas a pie de página
const doc = new Document({
footnotes: {
1: { children: [new Paragraph("Source: Annual Report 2024")] },
2: { children: [new Paragraph("See appendix for methodology")] },
},
sections: [{
children: [new Paragraph({
children: [
new TextRun("Revenue grew 15%"),
new FootnoteReferenceRun(1),
new TextRun(" using adjusted metrics"),
new FootnoteReferenceRun(2),
],
})]
}]
});Tabulaciones
// Right-align text on same line (e.g., date opposite a title)
new Paragraph({
children: [
new TextRun("Company Name"),
new TextRun("\tJanuary 2025"),
],
tabStops: [{ type: TabStopType.RIGHT, position: TabStopPosition.MAX }],
})
// Dot leader (e.g., TOC-style)
new Paragraph({
children: [
new TextRun("Introduction"),
new TextRun({ children: [
new PositionalTab({
alignment: PositionalTabAlignment.RIGHT,
relativeTo: PositionalTabRelativeTo.MARGIN,
leader: PositionalTabLeader.DOT,
}),
"3",
]}),
],
})Diseños de varias columnas
// Equal-width columns
sections: [{
properties: {
column: {
count: 2, // number of columns
space: 720, // gap between columns in DXA (720 = 0.5 inch)
equalWidth: true,
separate: true, // vertical line between columns
},
},
children: [/* content flows naturally across columns */]
}]
// Custom-width columns (equalWidth must be false)
sections: [{
properties: {
column: {
equalWidth: false,
children: [
new Column({ width: 5400, space: 720 }),
new Column({ width: 3240 }),
],
},
},
children: [/* content */]
}]Fuerce un salto de columna con una nueva sección usandotype: SectionType.NEXT_COLUMN.
Tabla de contenido
// CRITICAL: Headings must use HeadingLevel ONLY - no custom styles
new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" })Encabezados/pies de página
sections: [{
properties: {
page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } // 1440 = 1 inch
},
headers: {
default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] })
},
footers: {
default: new Footer({ children: [new Paragraph({
children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })]
})] })
},
children: [/* content */]
}]Reglas críticas para docx-js
- Establecer el tamaño de página explícitamente: el valor predeterminado de docx-js es A4; utilice Carta de EE. UU. (12240 x 15840 DXA) para documentos de EE. UU.
- Paisaje: pasar dimensiones verticales - docx-js intercambia ancho/alto internamente; pase el borde corto como
width, el borde largo comoheighty establezcaorientation: PageOrientation.LANDSCAPE - Nunca use
\n- use elementos de párrafo separados - Nunca use viñetas Unicode - use
LevelFormat.BULLETcon configuración de numeración - El salto de página debe estar en el párrafo: el sistema independiente crea XML no válido
- ImageRun requiere
type- siempre especifique png/jpg/etc. - Siempre configure la tabla
widthcon DXA - nunca useWidthType.PERCENTAGE(interrupciones en Google Docs) - Las tablas necesitan anchos dobles: matriz
columnWidthsY celdawidth, ambas deben coincidir - Ancho de la tabla = suma de anchos de columna - para DXA, asegúrese de que sumen exactamente
- Agregue siempre márgenes de celda: use
margins: { top: 80, bottom: 80, left: 120, right: 120 }para un relleno legible - Use
ShadingType.CLEAR- nunca SÓLIDO para sombrear la tabla - Nunca use tablas como divisores/reglas: las celdas tienen una altura mínima y se representan como cuadros vacíos (incluso en encabezados/pies de página); utilice
border: { bottom: { style: BorderStyle.SINGLE, size: 6, color: "2E75B6", space: 1 } }en un párrafo en su lugar. Para pies de página de dos columnas, utilice tabulaciones (consulte la sección Tabulaciones), no tablas - TOC solo requiere HeadingLevel: no hay estilos personalizados en los párrafos de encabezado
- Anular estilos integrados: use ID exactos: "Encabezado1", "Encabezado2", etc.
- Incluir
outlineLevel- requerido para TOC (0 para H1, 1 para H2, etc.)
Edición de documentos existentes
Siga los 3 pasos en orden.
Paso 1: desembalar
python scripts/office/unpack.py document.docx unpacked/Extrae XML, imprime bonitos, fusiona ejecuciones adyacentes y convierte comillas tipográficas en entidades XML (“, etc.) para que sobrevivan a la edición. Utilice--merge-runs falsepara omitir la ejecución de fusión.
Paso 2: editar XML
Edite archivos enunpacked/word/. Consulte la referencia XML a continuación para conocer los patrones.
Utilice "Claude" como autor para realizar un seguimiento de los cambios y comentarios, a menos que el usuario solicite explícitamente el uso de un nombre diferente.
Utilice la herramienta Editar directamente para reemplazar cadenas. No escriba scripts de Python. Los scripts introducen una complejidad innecesaria. La herramienta Editar muestra exactamente lo que se está reemplazando.
CRÍTICO: Utilice comillas tipográficas para contenido nuevo. Al agregar texto con apóstrofes o comillas, utilice entidades XML para producir comillas tipográficas:
<!-- Use these entities for professional typography -->
<w:t>Here’s a quote: “Hello”</w:t>| Entidad | Personaje |
|---|---|
‘ | ' (sencillo izquierdo) |
’ | ' (sencillo derecho / apóstrofe) |
“ | " (doble izquierda) |
” | " (doble derecha) |
Agregar comentarios: Utilicecomment.pypara manejar el texto repetitivo en varios archivos XML (el texto debe tener un formato XML con escape previo):
python scripts/comment.py unpacked/ 0 "Comment text with & and ’"
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0
python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author nameLuego agregue marcadores a document.xml (consulte Comentarios en la Referencia XML).
Paso 3: empacar
python scripts/office/pack.py unpacked/ output.docx --original document.docxValida con reparación automática, condensa XML y crea DOCX. Utilice--validate falsepara omitir.
La reparación automática solucionará:
durableId>= 0x7FFFFFFF (regenera ID válido)- Falta
xml:space="preserve"en<w:t>con espacios en blanco
La reparación automática no soluciona:
- XML con formato incorrecto, anidamiento de elementos no válido, relaciones faltantes, violaciones de esquema
Errores comunes
- Reemplazar elementos
<w:r>completos: al agregar cambios rastreados, reemplace todo el bloque<w:r>...</w:r>con<w:del>...<w:ins>...como hermanos. No inyecte etiquetas de cambios rastreados dentro de una ejecución. - Conserve el formato
<w:rPr>: copie el bloque<w:rPr>de la ejecución original en sus ejecuciones de cambios rastreadas para mantener la negrita, el tamaño de fuente, etc.
Referencia XML
Cumplimiento del esquema
- Orden de elementos en
<w:pPr>:<w:pStyle>,<w:numPr>,<w:spacing>,<w:ind>,<w:jc>,<w:rPr>últimos - Espacio en blanco: agregue
xml:space="preserve"a<w:t>con espacios iniciales/finales - RSID: debe ser hexadecimal de 8 dígitos (p. ej.,
00AB1234)
Cambios rastreados
Inserción:
<w:ins w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:t>inserted text</w:t></w:r>
</w:ins>Supresión:
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>deleted text</w:delText></w:r>
</w:del>Dentro de<w:del>: Utilice<w:delText>en lugar de<w:t>y<w:delInstrText>en lugar de<w:instrText>.
Ediciones mínimas: marque solo los cambios:
<!-- Change "30 days" to "60 days" -->
<w:r><w:t>The term is </w:t></w:r>
<w:del w:id="1" w:author="Claude" w:date="...">
<w:r><w:delText>30</w:delText></w:r>
</w:del>
<w:ins w:id="2" w:author="Claude" w:date="...">
<w:r><w:t>60</w:t></w:r>
</w:ins>
<w:r><w:t> days.</w:t></w:r>Eliminar párrafos completos/elementos de lista: al eliminar TODO el contenido de un párrafo, también marque la marca del párrafo como eliminado para que se fusione con el siguiente párrafo. Agregue<w:del/>dentro de<w:pPr><w:rPr>:
<w:p>
<w:pPr>
<w:numPr>...</w:numPr> <!-- list numbering if present -->
<w:rPr>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z"/>
</w:rPr>
</w:pPr>
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>Entire paragraph content being deleted...</w:delText></w:r>
</w:del>
</w:p>Sin<w:del/>en<w:pPr><w:rPr>, aceptar cambios deja un párrafo/elemento de lista vacío.
Rechazar la inserción de otro autor - anidar la eliminación dentro de su inserción:
<w:ins w:author="Jane" w:id="5">
<w:del w:author="Claude" w:id="10">
<w:r><w:delText>their inserted text</w:delText></w:r>
</w:del>
</w:ins>Restaurando la eliminación de otro autor - agregue una inserción después (no modifique su eliminación):
<w:del w:author="Jane" w:id="5">
<w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
<w:ins w:author="Claude" w:id="10">
<w:r><w:t>deleted text</w:t></w:r>
</w:ins>Comentarios
Después de ejecutarcomment.py(consulte el Paso 2), agregue marcadores a document.xml. Para respuestas, use la bandera--parenty anide los marcadores dentro de los de los padres.
CRÍTICO:<w:commentRangeStart>y<w:commentRangeEnd>son hermanos de<w:r>, nunca dentro de<w:r>.
<!-- Comment markers are direct children of w:p, never inside w:r -->
<w:commentRangeStart w:id="0"/>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>deleted</w:delText></w:r>
</w:del>
<w:r><w:t> more text</w:t></w:r>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<!-- Comment 0 with reply 1 nested inside -->
<w:commentRangeStart w:id="0"/>
<w:commentRangeStart w:id="1"/>
<w:r><w:t>text</w:t></w:r>
<w:commentRangeEnd w:id="1"/>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="1"/></w:r>Imágenes
- Agregar archivo de imagen a
word/media/ - Agregar relación a
word/_rels/document.xml.rels:
<Relationship Id="rId5" Type=".../image" Target="media/image1.png"/>- Agregue tipo de contenido a
[Content_Types].xml:
<Default Extension="png" ContentType="image/png"/>- Referencia en document.xml:
<w:drawing>
<wp:inline>
<wp:extent cx="914400" cy="914400"/> <!-- EMUs: 914400 = 1 inch -->
<a:graphic>
<a:graphicData uri=".../picture">
<pic:pic>
<pic:blipFill><a:blip r:embed="rId5"/></pic:blipFill>
</pic:pic>
</a:graphicData>
</a:graphic>
</wp:inline>
</w:drawing>Dependencias
- pandoc: Extracción de texto
- docx:
npm install -g docx(documentos nuevos) - LibreOffice: conversión de PDF (configurada automáticamente para entornos aislados a través de
scripts/office/soffice.py) - Poppler:
pdftoppmpara imágenes
Archivos de recursos
LICENCIA.txt
Recurso binario
scripts/init.py
scripts/aceptar_cambios.py
Descargar scripts/accept_changes.py
"""Accept all tracked changes in a DOCX file using LibreOffice.
Requires LibreOffice (soffice) to be installed.
"""
import argparse
import logging
import shutil
import subprocess
from pathlib import Path
from office.soffice import get_soffice_env
logger = logging.getLogger(__name__)
LIBREOFFICE_PROFILE = "/tmp/libreoffice_docx_profile"
MACRO_DIR = f"{LIBREOFFICE_PROFILE}/user/basic/Standard"
ACCEPT_CHANGES_MACRO = """<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE script:module PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "module.dtd">
<script:module xmlns:script="http://openoffice.org/2000/script" script:name="Module1" script:language="StarBasic">
Sub AcceptAllTrackedChanges()
Dim document As Object
Dim dispatcher As Object
document = ThisComponent.CurrentController.Frame
dispatcher = createUnoService("com.sun.star.frame.DispatchHelper")
dispatcher.executeDispatch(document, ".uno:AcceptAllTrackedChanges", "", 0, Array())
ThisComponent.store()
ThisComponent.close(True)
End Sub
</script:module>"""
def accept_changes(
input_file: str,
output_file: str,
) -> tuple[None, str]:
input_path = Path(input_file)
output_path = Path(output_file)
if not input_path.exists():
return None, f"Error: Input file not found: {input_file}"
if not input_path.suffix.lower() == ".docx":
return None, f"Error: Input file is not a DOCX file: {input_file}"
try:
output_path.parent.mkdir(parents=True, exist_ok=True)
shutil.copy2(input_path, output_path)
except Exception as e:
return None, f"Error: Failed to copy input file to output location: {e}"
if not _setup_libreoffice_macro():
return None, "Error: Failed to setup LibreOffice macro"
cmd = [
"soffice",
"--headless",
f"-env:UserInstallation=file://{LIBREOFFICE_PROFILE}",
"--norestore",
"vnd.sun.star.script:Standard.Module1.AcceptAllTrackedChanges?language=Basic&location=application",
str(output_path.absolute()),
]
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=30,
check=False,
env=get_soffice_env(),
)
except subprocess.TimeoutExpired:
return (
None,
f"Successfully accepted all tracked changes: {input_file} -> {output_file}",
)
if result.returncode != 0:
return None, f"Error: LibreOffice failed: {result.stderr}"
return (
None,
f"Successfully accepted all tracked changes: {input_file} -> {output_file}",
)
def _setup_libreoffice_macro() -> bool:
macro_dir = Path(MACRO_DIR)
macro_file = macro_dir / "Module1.xba"
if macro_file.exists() and "AcceptAllTrackedChanges" in macro_file.read_text():
return True
if not macro_dir.exists():
subprocess.run(
[
"soffice",
"--headless",
f"-env:UserInstallation=file://{LIBREOFFICE_PROFILE}",
"--terminate_after_init",
],
capture_output=True,
timeout=10,
check=False,
env=get_soffice_env(),
)
macro_dir.mkdir(parents=True, exist_ok=True)
try:
macro_file.write_text(ACCEPT_CHANGES_MACRO)
return True
except Exception as e:
logger.warning(f"Failed to setup LibreOffice macro: {e}")
return False
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Accept all tracked changes in a DOCX file"
)
parser.add_argument("input_file", help="Input DOCX file with tracked changes")
parser.add_argument(
"output_file", help="Output DOCX file (clean, no tracked changes)"
)
args = parser.parse_args()
_, message = accept_changes(args.input_file, args.output_file)
print(message)
if "Error" in message:
raise SystemExit(1)scripts/comentario.py
Recurso binario
scripts/office/helpers/init.py
Descargar scripts/office/helpers/init.py
Recurso binario
scripts/office/helpers/merge_runs.py
Descargar scripts/office/helpers/merge_runs.py
"""Merge adjacent runs with identical formatting in DOCX.
Merges adjacent <w:r> elements that have identical <w:rPr> properties.
Works on runs in paragraphs and inside tracked changes (<w:ins>, <w:del>).
Also:
- Removes rsid attributes from runs (revision metadata that doesn't affect rendering)
- Removes proofErr elements (spell/grammar markers that block merging)
"""
from pathlib import Path
import defusedxml.minidom
def merge_runs(input_dir: str) -> tuple[int, str]:
doc_xml = Path(input_dir) / "word" / "document.xml"
if not doc_xml.exists():
return 0, f"Error: {doc_xml} not found"
try:
dom = defusedxml.minidom.parseString(doc_xml.read_text(encoding="utf-8"))
root = dom.documentElement
_remove_elements(root, "proofErr")
_strip_run_rsid_attrs(root)
containers = {run.parentNode for run in _find_elements(root, "r")}
merge_count = 0
for container in containers:
merge_count += _merge_runs_in(container)
doc_xml.write_bytes(dom.toxml(encoding="UTF-8"))
return merge_count, f"Merged {merge_count} runs"
except Exception as e:
return 0, f"Error: {e}"
def _find_elements(root, tag: str) -> list:
results = []
def traverse(node):
if node.nodeType == node.ELEMENT_NODE:
name = node.localName or node.tagName
if name == tag or name.endswith(f":{tag}"):
results.append(node)
for child in node.childNodes:
traverse(child)
traverse(root)
return results
def _get_child(parent, tag: str):
for child in parent.childNodes:
if child.nodeType == child.ELEMENT_NODE:
name = child.localName or child.tagName
if name == tag or name.endswith(f":{tag}"):
return child
return None
def _get_children(parent, tag: str) -> list:
results = []
for child in parent.childNodes:
if child.nodeType == child.ELEMENT_NODE:
name = child.localName or child.tagName
if name == tag or name.endswith(f":{tag}"):
results.append(child)
return results
def _is_adjacent(elem1, elem2) -> bool:
node = elem1.nextSibling
while node:
if node == elem2:
return True
if node.nodeType == node.ELEMENT_NODE:
return False
if node.nodeType == node.TEXT_NODE and node.data.strip():
return False
node = node.nextSibling
return False
def _remove_elements(root, tag: str):
for elem in _find_elements(root, tag):
if elem.parentNode:
elem.parentNode.removeChild(elem)
def _strip_run_rsid_attrs(root):
for run in _find_elements(root, "r"):
for attr in list(run.attributes.values()):
if "rsid" in attr.name.lower():
run.removeAttribute(attr.name)
def _merge_runs_in(container) -> int:
merge_count = 0
run = _first_child_run(container)
while run:
while True:
next_elem = _next_element_sibling(run)
if next_elem and _is_run(next_elem) and _can_merge(run, next_elem):
_merge_run_content(run, next_elem)
container.removeChild(next_elem)
merge_count += 1
else:
break
_consolidate_text(run)
run = _next_sibling_run(run)
return merge_count
def _first_child_run(container):
for child in container.childNodes:
if child.nodeType == child.ELEMENT_NODE and _is_run(child):
return child
return None
def _next_element_sibling(node):
sibling = node.nextSibling
while sibling:
if sibling.nodeType == sibling.ELEMENT_NODE:
return sibling
sibling = sibling.nextSibling
return None
def _next_sibling_run(node):
sibling = node.nextSibling
while sibling:
if sibling.nodeType == sibling.ELEMENT_NODE:
if _is_run(sibling):
return sibling
sibling = sibling.nextSibling
return None
def _is_run(node) -> bool:
name = node.localName or node.tagName
return name == "r" or name.endswith(":r")
def _can_merge(run1, run2) -> bool:
rpr1 = _get_child(run1, "rPr")
rpr2 = _get_child(run2, "rPr")
if (rpr1 is None) != (rpr2 is None):
return False
if rpr1 is None:
return True
return rpr1.toxml() == rpr2.toxml()
def _merge_run_content(target, source):
for child in list(source.childNodes):
if child.nodeType == child.ELEMENT_NODE:
name = child.localName or child.tagName
if name != "rPr" and not name.endswith(":rPr"):
target.appendChild(child)
def _consolidate_text(run):
t_elements = _get_children(run, "t")
for i in range(len(t_elements) - 1, 0, -1):
curr, prev = t_elements[i], t_elements[i - 1]
if _is_adjacent(prev, curr):
prev_text = prev.firstChild.data if prev.firstChild else ""
curr_text = curr.firstChild.data if curr.firstChild else ""
merged = prev_text + curr_text
if prev.firstChild:
prev.firstChild.data = merged
else:
prev.appendChild(run.ownerDocument.createTextNode(merged))
if merged.startswith(" ") or merged.endswith(" "):
prev.setAttribute("xml:space", "preserve")
elif prev.hasAttribute("xml:space"):
prev.removeAttribute("xml:space")
run.removeChild(curr)scripts/office/helpers/simplify_redlines.py
Descargar scripts/office/helpers/simplify_redlines.py
"""Simplify tracked changes by merging adjacent w:ins or w:del elements.
Merges adjacent <w:ins> elements from the same author into a single element.
Same for <w:del> elements. This makes heavily-redlined documents easier to
work with by reducing the number of tracked change wrappers.
Rules:
- Only merges w:ins with w:ins, w:del with w:del (same element type)
- Only merges if same author (ignores timestamp differences)
- Only merges if truly adjacent (only whitespace between them)
"""
import xml.etree.ElementTree as ET
import zipfile
from pathlib import Path
import defusedxml.minidom
WORD_NS = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
def simplify_redlines(input_dir: str) -> tuple[int, str]:
doc_xml = Path(input_dir) / "word" / "document.xml"
if not doc_xml.exists():
return 0, f"Error: {doc_xml} not found"
try:
dom = defusedxml.minidom.parseString(doc_xml.read_text(encoding="utf-8"))
root = dom.documentElement
merge_count = 0
containers = _find_elements(root, "p") + _find_elements(root, "tc")
for container in containers:
merge_count += _merge_tracked_changes_in(container, "ins")
merge_count += _merge_tracked_changes_in(container, "del")
doc_xml.write_bytes(dom.toxml(encoding="UTF-8"))
return merge_count, f"Simplified {merge_count} tracked changes"
except Exception as e:
return 0, f"Error: {e}"
def _merge_tracked_changes_in(container, tag: str) -> int:
merge_count = 0
tracked = [
child
for child in container.childNodes
if child.nodeType == child.ELEMENT_NODE and _is_element(child, tag)
]
if len(tracked) < 2:
return 0
i = 0
while i < len(tracked) - 1:
curr = tracked[i]
next_elem = tracked[i + 1]
if _can_merge_tracked(curr, next_elem):
_merge_tracked_content(curr, next_elem)
container.removeChild(next_elem)
tracked.pop(i + 1)
merge_count += 1
else:
i += 1
return merge_count
def _is_element(node, tag: str) -> bool:
name = node.localName or node.tagName
return name == tag or name.endswith(f":{tag}")
def _get_author(elem) -> str:
author = elem.getAttribute("w:author")
if not author:
for attr in elem.attributes.values():
if attr.localName == "author" or attr.name.endswith(":author"):
return attr.value
return author
def _can_merge_tracked(elem1, elem2) -> bool:
if _get_author(elem1) != _get_author(elem2):
return False
node = elem1.nextSibling
while node and node != elem2:
if node.nodeType == node.ELEMENT_NODE:
return False
if node.nodeType == node.TEXT_NODE and node.data.strip():
return False
node = node.nextSibling
return True
def _merge_tracked_content(target, source):
while source.firstChild:
child = source.firstChild
source.removeChild(child)
target.appendChild(child)
def _find_elements(root, tag: str) -> list:
results = []
def traverse(node):
if node.nodeType == node.ELEMENT_NODE:
name = node.localName or node.tagName
if name == tag or name.endswith(f":{tag}"):
results.append(node)
for child in node.childNodes:
traverse(child)
traverse(root)
return results
def get_tracked_change_authors(doc_xml_path: Path) -> dict[str, int]:
if not doc_xml_path.exists():
return {}
try:
tree = ET.parse(doc_xml_path)
root = tree.getroot()
except ET.ParseError:
return {}
namespaces = {"w": WORD_NS}
author_attr = f"{{{WORD_NS}}}author"
authors: dict[str, int] = {}
for tag in ["ins", "del"]:
for elem in root.findall(f".//w:{tag}", namespaces):
author = elem.get(author_attr)
if author:
authors[author] = authors.get(author, 0) + 1
return authors
def _get_authors_from_docx(docx_path: Path) -> dict[str, int]:
try:
with zipfile.ZipFile(docx_path, "r") as zf:
if "word/document.xml" not in zf.namelist():
return {}
with zf.open("word/document.xml") as f:
tree = ET.parse(f)
root = tree.getroot()
namespaces = {"w": WORD_NS}
author_attr = f"{{{WORD_NS}}}author"
authors: dict[str, int] = {}
for tag in ["ins", "del"]:
for elem in root.findall(f".//w:{tag}", namespaces):
author = elem.get(author_attr)
if author:
authors[author] = authors.get(author, 0) + 1
return authors
except (zipfile.BadZipFile, ET.ParseError):
return {}
def infer_author(modified_dir: Path, original_docx: Path, default: str = "Claude") -> str:
modified_xml = modified_dir / "word" / "document.xml"
modified_authors = get_tracked_change_authors(modified_xml)
if not modified_authors:
return default
original_authors = _get_authors_from_docx(original_docx)
new_changes: dict[str, int] = {}
for author, count in modified_authors.items():
original_count = original_authors.get(author, 0)
diff = count - original_count
if diff > 0:
new_changes[author] = diff
if not new_changes:
return default
if len(new_changes) == 1:
return next(iter(new_changes))
raise ValueError(
f"Multiple authors added new changes: {new_changes}. "
"Cannot infer which author to validate."
)scripts/office/pack.py
Descargar scripts/office/pack.py
"""Pack a directory into a DOCX, PPTX, or XLSX file.
Validates with auto-repair, condenses XML formatting, and creates the Office file.
Usage:
python pack.py <input_directory> <output_file> [--original <file>] [--validate true|false]
Examples:
python pack.py unpacked/ output.docx --original input.docx
python pack.py unpacked/ output.pptx --validate false
"""
import argparse
import sys
import shutil
import tempfile
import zipfile
from pathlib import Path
import defusedxml.minidom
from validators import DOCXSchemaValidator, PPTXSchemaValidator, RedliningValidator
def pack(
input_directory: str,
output_file: str,
original_file: str | None = None,
validate: bool = True,
infer_author_func=None,
) -> tuple[None, str]:
input_dir = Path(input_directory)
output_path = Path(output_file)
suffix = output_path.suffix.lower()
if not input_dir.is_dir():
return None, f"Error: {input_dir} is not a directory"
if suffix not in {".docx", ".pptx", ".xlsx"}:
return None, f"Error: {output_file} must be a .docx, .pptx, or .xlsx file"
if validate and original_file:
original_path = Path(original_file)
if original_path.exists():
success, output = _run_validation(
input_dir, original_path, suffix, infer_author_func
)
if output:
print(output)
if not success:
return None, f"Error: Validation failed for {input_dir}"
with tempfile.TemporaryDirectory() as temp_dir:
temp_content_dir = Path(temp_dir) / "content"
shutil.copytree(input_dir, temp_content_dir)
for pattern in ["*.xml", "*.rels"]:
for xml_file in temp_content_dir.rglob(pattern):
_condense_xml(xml_file)
output_path.parent.mkdir(parents=True, exist_ok=True)
with zipfile.ZipFile(output_path, "w", zipfile.ZIP_DEFLATED) as zf:
for f in temp_content_dir.rglob("*"):
if f.is_file():
zf.write(f, f.relative_to(temp_content_dir))
return None, f"Successfully packed {input_dir} to {output_file}"
def _run_validation(
unpacked_dir: Path,
original_file: Path,
suffix: str,
infer_author_func=None,
) -> tuple[bool, str | None]:
output_lines = []
validators = []
if suffix == ".docx":
author = "Claude"
if infer_author_func:
try:
author = infer_author_func(unpacked_dir, original_file)
except ValueError as e:
print(f"Warning: {e} Using default author 'Claude'.", file=sys.stderr)
validators = [
DOCXSchemaValidator(unpacked_dir, original_file),
RedliningValidator(unpacked_dir, original_file, author=author),
]
elif suffix == ".pptx":
validators = [PPTXSchemaValidator(unpacked_dir, original_file)]
if not validators:
return True, None
total_repairs = sum(v.repair() for v in validators)
if total_repairs:
output_lines.append(f"Auto-repaired {total_repairs} issue(s)")
success = all(v.validate() for v in validators)
if success:
output_lines.append("All validations PASSED!")
return success, "\n".join(output_lines) if output_lines else None
def _condense_xml(xml_file: Path) -> None:
try:
with open(xml_file, encoding="utf-8") as f:
dom = defusedxml.minidom.parse(f)
for element in dom.getElementsByTagName("*"):
if element.tagName.endswith(":t"):
continue
for child in list(element.childNodes):
if (
child.nodeType == child.TEXT_NODE
and child.nodeValue
and child.nodeValue.strip() == ""
) or child.nodeType == child.COMMENT_NODE:
element.removeChild(child)
xml_file.write_bytes(dom.toxml(encoding="UTF-8"))
except Exception as e:
print(f"ERROR: Failed to parse {xml_file.name}: {e}", file=sys.stderr)
raise
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Pack a directory into a DOCX, PPTX, or XLSX file"
)
parser.add_argument("input_directory", help="Unpacked Office document directory")
parser.add_argument("output_file", help="Output Office file (.docx/.pptx/.xlsx)")
parser.add_argument(
"--original",
help="Original file for validation comparison",
)
parser.add_argument(
"--validate",
type=lambda x: x.lower() == "true",
default=True,
metavar="true|false",
help="Run validation with auto-repair (default: true)",
)
args = parser.parse_args()
_, message = pack(
args.input_directory,
args.output_file,
original_file=args.original,
validate=args.validate,
)
print(message)
if "Error" in message:
sys.exit(1)scripts/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/pml.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/pml.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/sml.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/sml.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/wml.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/wml.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/xml.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/xml.xsd
Recurso binario
scripts/office/schemas/ecma/cuarta edición/opc-contentTypes.xsd
Descargar scripts/office/schemas/ecma/fouth-edition/opc-contentTypes.xsd
Recurso binario
scripts/office/schemas/ecma/cuarta-edición/opc-coreProperties.xsd
Descargar scripts/office/schemas/ecma/fouth-edition/opc-coreProperties.xsd
Recurso binario
scripts/office/schemas/ecma/cuarta-edición/opc-digSig.xsd
Descargar scripts/office/schemas/ecma/fouth-edition/opc-digSig.xsd
Recurso binario
scripts/office/schemas/ecma/cuarta-edición/opc-relationships.xsd
Descargar scripts/office/schemas/ecma/fouth-edition/opc-relationships.xsd
Recurso binario
scripts/office/schemas/mce/mc.xsd
Descargar scripts/office/schemas/mce/mc.xsd
Recurso binario
scripts/office/schemas/microsoft/wml-2010.xsd
Descargar scripts/office/schemas/microsoft/wml-2010.xsd
Recurso binario
scripts/office/schemas/microsoft/wml-2012.xsd
Descargar scripts/office/schemas/microsoft/wml-2012.xsd
Recurso binario
scripts/office/schemas/microsoft/wml-2018.xsd
Descargar scripts/office/schemas/microsoft/wml-2018.xsd
Recurso binario
scripts/office/schemas/microsoft/wml-cex-2018.xsd
Descargar scripts/office/schemas/microsoft/wml-cex-2018.xsd
Recurso binario
scripts/office/schemas/microsoft/wml-cid-2016.xsd
Descargar scripts/office/schemas/microsoft/wml-cid-2016.xsd
Recurso binario
scripts/office/schemas/microsoft/wml-sdtdatahash-2020.xsd
Descargar scripts/office/schemas/microsoft/wml-sdtdatahash-2020.xsd
Recurso binario
scripts/office/schemas/microsoft/wml-symex-2015.xsd
Descargar scripts/office/schemas/microsoft/wml-symex-2015.xsd
Recurso binario
scripts/office/soffice.py
Descargar scripts/office/soffice.py
"""
Helper for running LibreOffice (soffice) in environments where AF_UNIX
sockets may be blocked (e.g., sandboxed VMs). Detects the restriction
at runtime and applies an LD_PRELOAD shim if needed.
Usage:
from office.soffice import run_soffice, get_soffice_env
# Option 1 – run soffice directly
result = run_soffice(["--headless", "--convert-to", "pdf", "input.docx"])
# Option 2 – get env dict for your own subprocess calls
env = get_soffice_env()
subprocess.run(["soffice", ...], env=env)
"""
import os
import socket
import subprocess
import tempfile
from pathlib import Path
def get_soffice_env() -> dict:
env = os.environ.copy()
env["SAL_USE_VCLPLUGIN"] = "svp"
if _needs_shim():
shim = _ensure_shim()
env["LD_PRELOAD"] = str(shim)
return env
def run_soffice(args: list[str], **kwargs) -> subprocess.CompletedProcess:
env = get_soffice_env()
return subprocess.run(["soffice"] + args, env=env, **kwargs)
_SHIM_SO = Path(tempfile.gettempdir()) / "lo_socket_shim.so"
def _needs_shim() -> bool:
try:
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.close()
return False
except OSError:
return True
def _ensure_shim() -> Path:
if _SHIM_SO.exists():
return _SHIM_SO
src = Path(tempfile.gettempdir()) / "lo_socket_shim.c"
src.write_text(_SHIM_SOURCE)
subprocess.run(
["gcc", "-shared", "-fPIC", "-o", str(_SHIM_SO), str(src), "-ldl"],
check=True,
capture_output=True,
)
src.unlink()
return _SHIM_SO
_SHIM_SOURCE = r"""
#define _GNU_SOURCE
#include <dlfcn.h>
#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <unistd.h>
static int (*real_socket)(int, int, int);
static int (*real_socketpair)(int, int, int, int[2]);
static int (*real_listen)(int, int);
static int (*real_accept)(int, struct sockaddr *, socklen_t *);
static int (*real_close)(int);
static int (*real_read)(int, void *, size_t);
/* Per-FD bookkeeping (FDs >= 1024 are passed through unshimmed). */
static int is_shimmed[1024];
static int peer_of[1024];
static int wake_r[1024]; /* accept() blocks reading this */
static int wake_w[1024]; /* close() writes to this */
static int listener_fd = -1; /* FD that received listen() */
__attribute__((constructor))
static void init(void) {
real_socket = dlsym(RTLD_NEXT, "socket");
real_socketpair = dlsym(RTLD_NEXT, "socketpair");
real_listen = dlsym(RTLD_NEXT, "listen");
real_accept = dlsym(RTLD_NEXT, "accept");
real_close = dlsym(RTLD_NEXT, "close");
real_read = dlsym(RTLD_NEXT, "read");
for (int i = 0; i < 1024; i++) {
peer_of[i] = -1;
wake_r[i] = -1;
wake_w[i] = -1;
}
}
/* ---- socket ---------------------------------------------------------- */
int socket(int domain, int type, int protocol) {
if (domain == AF_UNIX) {
int fd = real_socket(domain, type, protocol);
if (fd >= 0) return fd;
/* socket(AF_UNIX) blocked – fall back to socketpair(). */
int sv[2];
if (real_socketpair(domain, type, protocol, sv) == 0) {
if (sv[0] >= 0 && sv[0] < 1024) {
is_shimmed[sv[0]] = 1;
peer_of[sv[0]] = sv[1];
int wp[2];
if (pipe(wp) == 0) {
wake_r[sv[0]] = wp[0];
wake_w[sv[0]] = wp[1];
}
}
return sv[0];
}
errno = EPERM;
return -1;
}
return real_socket(domain, type, protocol);
}
/* ---- listen ---------------------------------------------------------- */
int listen(int sockfd, int backlog) {
if (sockfd >= 0 && sockfd < 1024 && is_shimmed[sockfd]) {
listener_fd = sockfd;
return 0;
}
return real_listen(sockfd, backlog);
}
/* ---- accept ---------------------------------------------------------- */
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen) {
if (sockfd >= 0 && sockfd < 1024 && is_shimmed[sockfd]) {
/* Block until close() writes to the wake pipe. */
if (wake_r[sockfd] >= 0) {
char buf;
real_read(wake_r[sockfd], &buf, 1);
}
errno = ECONNABORTED;
return -1;
}
return real_accept(sockfd, addr, addrlen);
}
/* ---- close ----------------------------------------------------------- */
int close(int fd) {
if (fd >= 0 && fd < 1024 && is_shimmed[fd]) {
int was_listener = (fd == listener_fd);
is_shimmed[fd] = 0;
if (wake_w[fd] >= 0) { /* unblock accept() */
char c = 0;
write(wake_w[fd], &c, 1);
real_close(wake_w[fd]);
wake_w[fd] = -1;
}
if (wake_r[fd] >= 0) { real_close(wake_r[fd]); wake_r[fd] = -1; }
if (peer_of[fd] >= 0) { real_close(peer_of[fd]); peer_of[fd] = -1; }
if (was_listener)
_exit(0); /* conversion done – exit */
}
return real_close(fd);
}
"""
if __name__ == "__main__":
import sys
result = run_soffice(sys.argv[1:])
sys.exit(result.returncode)scripts/office/unpack.py
Descargar scripts/office/unpack.py
"""Unpack Office files (DOCX, PPTX, XLSX) for editing.
Extracts the ZIP archive, pretty-prints XML files, and optionally:
- Merges adjacent runs with identical formatting (DOCX only)
- Simplifies adjacent tracked changes from same author (DOCX only)
Usage:
python unpack.py <office_file> <output_dir> [options]
Examples:
python unpack.py document.docx unpacked/
python unpack.py presentation.pptx unpacked/
python unpack.py document.docx unpacked/ --merge-runs false
"""
import argparse
import sys
import zipfile
from pathlib import Path
import defusedxml.minidom
from helpers.merge_runs import merge_runs as do_merge_runs
from helpers.simplify_redlines import simplify_redlines as do_simplify_redlines
SMART_QUOTE_REPLACEMENTS = {
"\u201c": "“",
"\u201d": "”",
"\u2018": "‘",
"\u2019": "’",
}
def unpack(
input_file: str,
output_directory: str,
merge_runs: bool = True,
simplify_redlines: bool = True,
) -> tuple[None, str]:
input_path = Path(input_file)
output_path = Path(output_directory)
suffix = input_path.suffix.lower()
if not input_path.exists():
return None, f"Error: {input_file} does not exist"
if suffix not in {".docx", ".pptx", ".xlsx"}:
return None, f"Error: {input_file} must be a .docx, .pptx, or .xlsx file"
try:
output_path.mkdir(parents=True, exist_ok=True)
with zipfile.ZipFile(input_path, "r") as zf:
zf.extractall(output_path)
xml_files = list(output_path.rglob("*.xml")) + list(output_path.rglob("*.rels"))
for xml_file in xml_files:
_pretty_print_xml(xml_file)
message = f"Unpacked {input_file} ({len(xml_files)} XML files)"
if suffix == ".docx":
if simplify_redlines:
simplify_count, _ = do_simplify_redlines(str(output_path))
message += f", simplified {simplify_count} tracked changes"
if merge_runs:
merge_count, _ = do_merge_runs(str(output_path))
message += f", merged {merge_count} runs"
for xml_file in xml_files:
_escape_smart_quotes(xml_file)
return None, message
except zipfile.BadZipFile:
return None, f"Error: {input_file} is not a valid Office file"
except Exception as e:
return None, f"Error unpacking: {e}"
def _pretty_print_xml(xml_file: Path) -> None:
try:
content = xml_file.read_text(encoding="utf-8")
dom = defusedxml.minidom.parseString(content)
xml_file.write_bytes(dom.toprettyxml(indent=" ", encoding="utf-8"))
except Exception:
pass
def _escape_smart_quotes(xml_file: Path) -> None:
try:
content = xml_file.read_text(encoding="utf-8")
for char, entity in SMART_QUOTE_REPLACEMENTS.items():
content = content.replace(char, entity)
xml_file.write_text(content, encoding="utf-8")
except Exception:
pass
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Unpack an Office file (DOCX, PPTX, XLSX) for editing"
)
parser.add_argument("input_file", help="Office file to unpack")
parser.add_argument("output_directory", help="Output directory")
parser.add_argument(
"--merge-runs",
type=lambda x: x.lower() == "true",
default=True,
metavar="true|false",
help="Merge adjacent runs with identical formatting (DOCX only, default: true)",
)
parser.add_argument(
"--simplify-redlines",
type=lambda x: x.lower() == "true",
default=True,
metavar="true|false",
help="Merge adjacent tracked changes from same author (DOCX only, default: true)",
)
args = parser.parse_args()
_, message = unpack(
args.input_file,
args.output_directory,
merge_runs=args.merge_runs,
simplify_redlines=args.simplify_redlines,
)
print(message)
if "Error" in message:
sys.exit(1)scripts/office/validate.py
Descargar scripts/office/validate.py
"""
Command line tool to validate Office document XML files against XSD schemas and tracked changes.
Usage:
python validate.py <path> [--original <original_file>] [--auto-repair] [--author NAME]
The first argument can be either:
- An unpacked directory containing the Office document XML files
- A packed Office file (.docx/.pptx/.xlsx) which will be unpacked to a temp directory
Auto-repair fixes:
- paraId/durableId values that exceed OOXML limits
- Missing xml:space="preserve" on w:t elements with whitespace
"""
import argparse
import sys
import tempfile
import zipfile
from pathlib import Path
from validators import DOCXSchemaValidator, PPTXSchemaValidator, RedliningValidator
def main():
parser = argparse.ArgumentParser(description="Validate Office document XML files")
parser.add_argument(
"path",
help="Path to unpacked directory or packed Office file (.docx/.pptx/.xlsx)",
)
parser.add_argument(
"--original",
required=False,
default=None,
help="Path to original file (.docx/.pptx/.xlsx). If omitted, all XSD errors are reported and redlining validation is skipped.",
)
parser.add_argument(
"-v",
"--verbose",
action="store_true",
help="Enable verbose output",
)
parser.add_argument(
"--auto-repair",
action="store_true",
help="Automatically repair common issues (hex IDs, whitespace preservation)",
)
parser.add_argument(
"--author",
default="Claude",
help="Author name for redlining validation (default: Claude)",
)
args = parser.parse_args()
path = Path(args.path)
assert path.exists(), f"Error: {path} does not exist"
original_file = None
if args.original:
original_file = Path(args.original)
assert original_file.is_file(), f"Error: {original_file} is not a file"
assert original_file.suffix.lower() in [".docx", ".pptx", ".xlsx"], (
f"Error: {original_file} must be a .docx, .pptx, or .xlsx file"
)
file_extension = (original_file or path).suffix.lower()
assert file_extension in [".docx", ".pptx", ".xlsx"], (
f"Error: Cannot determine file type from {path}. Use --original or provide a .docx/.pptx/.xlsx file."
)
if path.is_file() and path.suffix.lower() in [".docx", ".pptx", ".xlsx"]:
temp_dir = tempfile.mkdtemp()
with zipfile.ZipFile(path, "r") as zf:
zf.extractall(temp_dir)
unpacked_dir = Path(temp_dir)
else:
assert path.is_dir(), f"Error: {path} is not a directory or Office file"
unpacked_dir = path
match file_extension:
case ".docx":
validators = [
DOCXSchemaValidator(unpacked_dir, original_file, verbose=args.verbose),
]
if original_file:
validators.append(
RedliningValidator(unpacked_dir, original_file, verbose=args.verbose, author=args.author)
)
case ".pptx":
validators = [
PPTXSchemaValidator(unpacked_dir, original_file, verbose=args.verbose),
]
case _:
print(f"Error: Validation not supported for file type {file_extension}")
sys.exit(1)
if args.auto_repair:
total_repairs = sum(v.repair() for v in validators)
if total_repairs:
print(f"Auto-repaired {total_repairs} issue(s)")
success = all(v.validate() for v in validators)
if success:
print("All validations PASSED!")
sys.exit(0 if success else 1)
if __name__ == "__main__":
main()scripts/office/validators/init.py
Descargar scripts/office/validators/init.py
"""
Validation modules for Word document processing.
"""
from .base import BaseSchemaValidator
from .docx import DOCXSchemaValidator
from .pptx import PPTXSchemaValidator
from .redlining import RedliningValidator
__all__ = [
"BaseSchemaValidator",
"DOCXSchemaValidator",
"PPTXSchemaValidator",
"RedliningValidator",
]scripts/office/validadores/base.py
Descargar scripts/office/validators/base.py
Recurso binario
scripts/office/validadores/docx.py
Descargar scripts/office/validators/docx.py
Recurso binario
scripts/office/validadores/pptx.py
Descargar scripts/office/validators/pptx.py
Recurso binario
scripts/office/validators/redlining.py
Descargar scripts/office/validators/redlining.py
Recurso binario
scripts/plantillas/comentarios.xml
Descargar scripts/templates/comments.xml
Recurso binario
scripts/plantillas/comentariosExtended.xml
Descargar scripts/templates/commentsExtended.xml
Recurso binario
scripts/plantillas/comentariosExtensible.xml
Descargar scripts/templates/commentsExtensible.xml
Recurso binario
scripts/plantillas/commentsIds.xml
Descargar scripts/templates/commentsIds.xml
Recurso binario
scripts/plantillas/personas.xml
Descargar scripts/templates/people.xml
Recurso binario
Ver en GitHub
Coautoría de documentos
Guíe a los usuarios a través de un flujo de trabajo estructurado para la documentación de coautoría. Úselo cuando el usuario quiera escribir documentación, propuestas, especificaciones técnicas, documentos de decisiones o contenido estructurado similar. Este flujo de trabajo ayuda a los usuarios a transferir contexto de manera eficiente, refinar el contenido mediante iteración y verificar que el documento funcione para los lectores. Se activa cuando el usuario menciona escribir documentos, crear propuestas, redactar especificaciones o tareas de documentación similares.
Utilice esta habilidad siempre que el usuario quiera hacer algo con archivos PDF. Esto incluye leer o extraer texto/tablas de archivos PDF, combinar o fusionar varios archivos PDF en uno, dividir archivos PDF, rotar páginas, agregar marcas de agua, crear nuevos archivos PDF, completar formularios PDF, cifrar/descifrar archivos PDF, extraer imágenes y OCR en archivos PDF escaneados para que se puedan realizar búsquedas. Si el usuario menciona un archivo.pdf o solicita producir uno, utilice esta habilidad.
claudeskills Docs