Something that comes as a surprise to many people is that the .docx file type - along with the other file types in the Office Open XML family - is a compressed file type that can be opened with a file archiver like 7-Zip. (If you have problems opening the file then change the .docx extension to .zip e.g. example.docx becomes example.zip.)
If you do this you'll see that a .docx is actually a wrapper for XML files, primarily of type .xml, with a few .rels files as well (.rels are XML relationship files that tell a program how the .xml files relate to each other):
So a .docx file isn't one file, it's a whole bunch of files that are parsed by Microsoft Word into a coherent whole for the user to work on. Word is essentially a GUI front end for XML.
The correct technical term for a Word document is a package (the .docx file) which is comprised of parts (the .xml files) which are connected using relationships (the .rels files).
The list of files and folders in a new, blank Word 2016/365 document is as follows:
- \docProps\app.xml - Metadata from the application.
- \docProps\core.xml - Metadata from the document.
- \word\theme\theme1.xml - The default theme for the document.
- \word\_rels\document.xml.rels - The relationship file that contains the relations between document.xml and other parts of the package.
- \word\document.xml - The actual content in your document.
- \word\fontTable.xml - A list of all the fonts used in the document.
- \word\settings.xml - Various documents settings, including defaults and security settings.
- \word\styles.xml - The styles that are available in the document (not the styles that have been used, all the styles that are available).
- \word\webSettings.xml - The web-specific settings used by the document.
- \_rels\.rels - The package level relationship for the document.
- [Content_Types].xml - The various types of parts that are used in the package.
If you add things to your document then other folders and files will be created by Word.
For example, embedded media such as images will be in the \word\media folder and embedded files such as spreadsheets will be in the \word\embeddings folder.
An important issue to note is that when you embed an image and then edit it, the original image is stored in the \word\media folder, and the edits are stored in \word\document.xml, in the <w:pict> element. If you've cropped or otherwise obfuscated private information using Word's inbuilt editing tools then anyone who opens the .docx package using something like 7-Zip can still see the original picture.
There is a lot more that can be said about the Open XML format, so rather than rewrite the wheel here are some links to explore:
- Word 2007 XML Walkthrough (from Microsoft for Word 2007 but the basic information seems to be correct for Word 2016 as well)
- How to Manipulate OOXML Documents (from Microsoft for Word 2007 but the basic information seems to be correct for Word 2016 as well)
- Open XML Overview (PDF, 14 pages)
- ECMA-376 specifications (European Computer Manufacturers Association)
- ISO Specifications (search for 29500 for the 5 documents and electronic inserts)
- OOXML schema documents (the .xsds)
Finally, if you're thinking that this is very interesting but not particularly useful for writing topic-based documentation then you should know that Yes, you can write DITA in Word!