how to tell a file by its content



here's how to tell what a file really is by its content. sometimes people think they can rename files and suddenly it works as a different document type. wrong. but Due to license agreements of Microsoft and Adobe products, I can't look into the output of their files and give you the file format. the file format of microsoft documents is available from their web site.

file extensions and their content
extension file
.7z 7-zip starts with 7z
.zip, .docx, .dotx, .pptx, .ppsx, etc zip/PKZIP, (all ms office documents 2007 and up are known to be zipped XML files with your pictures, videos, and files) starts with PK
.exe windows/dos executable starts with MZ
.gif GIF 89 format image starts with GIF89
.png Portable Network Graphics (PNG) 1st byte 0x89 followed by PNG
.xcf GIMP native format xcf file starts with gimp xcf file
.mp3 MP3 starts with 0xFF 0xE3 0x10 0x0C 0x00 0x01
.m4a m4a starts with 0x00 0x00 0x00 0x18 0x66 0x74 0x79 0x70 0x6D 0x70 0x34 0x32 0x00 0x00 0x00 0x00 0x6D 0x70 0x34 0x32 0x69 0x73 0x6F 0x6D 0x00
.mpg mpeg starts with 0x00 0x00 0x01 0xBA
.avi AVI starts with RIFF
.bmp BMP image starts with BM
.doc document many formats. could be wordpad RTF, could be ms office 97-2003, could be plain ascii text. ms office 97-2003 binary format documents are documented in the msdn.
.ppt, .pps, .xls, .xlw, .dot ms office 97-2003 binary format documents are documented in the msdn.
.pub, .pubx Microsoft Publisher Unpublished format.
.csv Comma Separated Values bunch of fields separated by , and may or may not have strings surrounded by double quotes. line endings differ depending on platform.
.txt text file or Tab Separated Values if it's a text file, it has plain ascii or UNICODE text, viewable with Notepad. if it's Tab Separated Values, it's an ASCII file with ascii character code 0x09 or Horizontal Tab separating the fields.