Do you wonder why Word 2007 documents have a new file extension, docx?
After all, Word documents have used the filename suffix .doc for years. So where did the X come from?
The X stands for XML (Extensible Markup Language), which is the programming language behind the scenes in Office 2007 programs.
Previous versions of Word were based on binary data. Binary data is, by nature, prone to corruption, and "bloated."
By changing the background code, Microsoft was able to create new tools and features in Word 2007 that would not have been possible otherwise.
Compared to binary files, XML is compact, portable, and stable. In other words, a Word 2007 document is less susceptible to irreparable damage.
Let's perform an experiment to compare file sizes between identical binary- and XML-based documents. For this experiment, I created a Word document containing text and a photo:
Then, I saved it as a Word 97–2003 (binary) document and also as a Word 2007 (XML) document.
As you can see, the size of the .doc file is 102kb, but the size of the .docx file is 91kb. That is a savings of 11 kb!
Eleven kilobytes may not seem like much, but this is a very small file with only two sentences of text and one photo.
Now imagine millions of large documents across an entire organization...data storage requirements can significantly be reduced by converting older Word documents to the docx format.
XML is an open specification. It is easily shared between different operating systems, such as Windows, Unix, or Mac OSX.
A Word 2007 file is actually a zip folder containing XML files and files that contain other information such as graphics and styles.
Anyone using a text or XML editor can unzip a Word 2007 document, then open the XML file that contains the textual data.
Unzip a Docx
Let's unzip a Word document so you can see XML in action.
Open Word 2007 and create a document with text and some graphics, then save it as test.docx to your desktop. Now change the name of the file to test.zip.
Unzip the file using the unzip utility that comes with Windows or your favorite Zip program.
Here is how to open test.zip using Windows XP:
You should now have a folder named Test on your desktop. If you open the folder, you will see several folders and an XML file inside.
Open the Word folder. The document that contains your textual data is named document.xml. Open it with a text editor and view it. You will see the XML markup, but you can also see bits and pieces of your original text mixed in.
You can find your photo(s) in the Media folder.
The unzipping technique is one way to recover file information if a docx file becomes corrupted.
But one advantage to using docx files, is that they are less likely to become corrupted in the first place. That is because XML is more stable than binary data.
If a Word 2007 document does become damaged, you can try an Open and Repair procedure to recover the file. (See my Troubleshooting tips for instructions.)
One problem with the new docx format is that it can cause difficulties when sharing documents with users of previous versions. Often, co-workers complain that they cannot open docx files. (This is usually because they don't have the compatibility pack installed.)
But Word 2007 is backward compatible, meaning it was designed to work with previous versions.
You can work with and save Word 2007 documents in the format of previous versions (97-2003)...these documents are saved with the .doc file extension. This is called compatibility mode.
However, working in compatibility mode will cause you to lose some functionality. (For instance, you will not be able to use the newest chart features.)
You can always tell if you are in compatibility mode by looking at the title bar on the ribbon. The name of your document will be followed by [Compatibility Mode].
If you share documents with co-workers who use previous versions you can still work in Word 2007 mode so you can take advantage of its features, but also save it as a .doc file to share.
Then you can have the best of both worlds!
Return from Docx to Microsoft Word 2007