gimmick:ForkMeOnGitHub (position: 'right', color: 'darkblue')
Note: This is the file index containing the complete dataset of file examples. The dataset is also avaliable as a JSON file here
This repository can be defined as:
- A collection of file examples of different formats.
- Samples of files and structures for everyday use.
- A compendium of links of sample files throughout the internet.
The general ideia, is to provide an index of materials for those situations in software development or design where you might need to do some unit testing with real world files.
Below the files are listed by category and type/context.
7z is a compressed archive file format that supports several different data compression, encryption and pre-processing algorithms. The 7z format initially appeared as implemented by the 7-Zip archiver. The 7-Zip program is publicly available under the terms of the GNU Lesser General Public License. See more at 7z
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
A 7z archive of the Wikipedia.org homepage HTML file |
WikipediaHomePage.7z | WikipediaHomePage.7z | 52.47 KB |
gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems. See more at gzip
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| A gzip archive of the HTML version of the page Wikipedia.org | Wikipedia.html.gz | Wikipedia.html.gz | 56.4 KB |
An optical disc image (or ISO image, from the ISO 9660 file system used with CD-ROM media) is a disk image that contains everything that would be written to an optical disc, disk sector by disc sector, including the optical disc file system. See more at ISO
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| An iso image of a CD-ROM containing a copy of wikipedi.org HTML page inside one folder | wikipedia-org-one-page.iso | wikipedia-org-one-page.iso | 216.0 KB |
In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes A tar archive consists of a series of file objects, each file object includes any file data, and is preceded by a 512-byte header record. The file data is written unaltered except that its length is rounded up to a multiple of 512 bytes. See more at tar
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| A tar archive containing two files: the HTML versions of the Wikipedia.org and the page www.isitchristmas.com | Wikipedia.tar | Wikipedia.tar | 168.0 KB |
This format is used especifically for archiving web-crawls, and is a revision of the Internet Archives ARC File Format, specifying a method for combining multiple digital resources into an aggregate archive file together with related information.
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| One page WARC archive of the Wikipedia.org homepage | WikipediaOrg-20201212031238412.warc | WikipediaOrg-20201212031238412.warc | 421.33 KB |
The ZIM file format is an open file format that stores wiki content for offline usage. Its primary focus is the contents of Wikipedia and other Wikimedia projects. The format allows for the compression of articles, features a full-text search index and native category and image handling similar to MediaWiki, and the entire file is easily indexable and readable using a program like Kiwix – unlike native Wikipedia XML database dumps. (source) the Kwix open source project offer a collection of ZIM archive here
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| A zim archive containing every book from the english Open Source Collection of the Gutenberg Project (hosted by Kiwix) | gutenberg_en_all_2020-12.zim.url | gutenberg_en_all_2020-12.zim.url | 61GB | 2020-12 |
| TOP 100 Articles from Wikipedia EN-US zim file, hosted by Kiwix | wikipedia_en_100_2020-10.zim.url | wikipedia_en_100_2020-10.zim.url | 304M | 2020-10 |
| Every page and picture from Wikipedia EN-US hosted by Kiwix | wikipedia_en_all_maxi_2020-11.zim.url | wikipedia_en_all_maxi_2020-11.zim.url | 94GB | 2020-11 |
| A zim archive containing every page from Wikipedia EN-US without pictures (hosted by Kiwix) | wikipedia_en_all_nopic_2020-10.zim.url | wikipedia_en_all_nopic_2020-10.zim.url | 39GB | 2020-10 |
ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| A zip archive of the HTML version of the page Wikipedia.org | WikipediaOrg.zip | WikipediaOrg.zip | 56.52 KB | ZIP |
Makefile.am is a programmer-defined file and is used by automake to generate the Makefile.in file (the .am stands for automake). GNU Automake is a tool for automatically generating Makefile.in files compliant with the GNU Coding Standards (See more). This software is part of Autoconf wich is an extensible package of M4 macros that produce shell scripts to automatically configure software source code packages. These scripts can adapt the packages to many kinds of UNIX-like systems without manual user intervention. Autoconf creates a configuration script for a package from a template file that lists the operating system features that the package can use, in the form of M4 macro calls.
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| Makefile for HTTrack | Makefile.am | Makefile.am | 196.0 B |
C is the default extension for the [C Programming Language](https://en.wikipedia.org/wiki/C_(programming_language). These are text files, that later are compiled into machine code by the C compiler.
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| HTTrack library example .c file, distributed under the GNU General Public License, Copyright (C) Xavier Roche and other contributors | example.c | example.c | 7.65 KB |
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| A simple PHP file that outputs a option box | PHP-OptionSelect.php | PHP-OptionSelect.php | 327.0 B |
Bink Video is a proprietary file format (extensions .bik and .bk2) for video developed by RAD Game Tools. It has been primarily used for full-motion video sequences in video games, and has been used in games for Windows, Mac OS, Xbox 360, Xbox, GameCube, Wii, PlayStation 3, PlayStation 2, Dreamcast, Nintendo DS, and PSP. See more at Bink Video
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| The Sierra Trademark presentation from a classic 2000's CD-ROM | Sierra_Logo.bik | Sierra_Logo.bik | 3.91 MB |
MPEG-4 Part 14 or MP4 is a digital multimedia container format most commonly used to store video and audio, but it can also be used to store other data such as subtitles and still images. See more here
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| The sample introduction video from the Nextcloud opensource platform] | [Nextcloud intro.mp4](https://raw.githubusercontent.com/thethales/File-Examples/main//file-examples/MP4/Nextcloud intro.mp4) | Nextcloud intro.mp4 | 3.78 MB |
.blend is the dafult file system for Blender and can pack multiple scenes into a single file. The best place to find blender sample files is at blender.org/download/demo-files/ though they are offered in .zip containers. Below some independent samples are listed.
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| A simple 535kb file containing the default blender objet, the cube | [cube.blend](https://raw.githubusercontent.com/thethales/File-Examples/main//file-examples/BLEND/cube.blend copy.backup) | cube.blend | 135.0 B | 2.82 |
| A simple 535kb file containing the default blender objet, the cube | cube.blend | cube.blend | 544.47 KB | 2.82 |
.fbx (Filmbox) is a proprietary file format owned by Autodesk since 2006. It is currently one of the main 3D exchange formats as used by many 3D tools¹ ². FBX has a text based (ascii) and a binary version. There's no known public documentation avaliable, notes on the innerworkings of the format are provided in the following links:
- Blender Foundation, FBX Text-Based and Binary File Structure: Original post | Archive.org
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| A simple 3D cube exported as FBX by Blender 2.82 | cube.fbx | cube.fbx | 25.7 KB | Kaydara FBX Binary |
X3D is a royalty-free ISO/IEC standard for declaratively representing 3D computer graphics. File format support includes XML, ClassicVRML, Compressed Binary Encoding (CBE) and a draft JSON encoding. See more at x3d
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| A simple file containing the default blender objet, the cube, exported on Blender 2.92 | cube.x3d | cube.x3d | 3.59 KB |
.doc and .docx are the formats of files created by Microsoft Word
Mime Types: application/doc application/ms-doc application/msword
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| One Page Document of Lorem Ipsum | DOC_LoremIpsum_OnePage.docx | DOC_LoremIpsum_OnePage.docx | 8.81 KB |
A plain text file containing genealogical information about individuals, and metadata linking these records together. This data model is based on the nuclear family and the individual. This contrasts with evidence-based models, where data is structured to reflect the supporting evidence. In the GEDCOM lineage-linked data model, all data is structured to reflect the believed reality, that is, actual (or hypothesized) nuclear families and individuals. Source
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| A Sample file generated on Ancestry.com containing a simple family structure. | [Surname Family Tree.ged](https://raw.githubusercontent.com/thethales/File-Examples/main//file-examples/GEDCOM/Surname Family Tree.ged) | Surname Family Tree.ged | 979.0 B | 5.5 |
.html is the extension for Hypertext Markup Language files, wich are the standard markup language for documents designed to be displayed in a web browser. Every web page on the web build based on html
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| A simple HTML page listing some Lorem Ipsum paragraphs | HTMLLoremIpsumOnePage.html | HTMLLoremIpsumOnePage.html | 10.72 KB | |
| A slimmed version of the sample Lorem Ipsum HTML page. | HTMLLoremIpsumOnePage.min.html | HTMLLoremIpsumOnePage.min.html | 6.83 KB |
The Portable Document Format (PDF) is a file format developed to present documents independent of hardware, software and operating system. This format is widely used and has several versions (as of Oct. 2020, 7 revisions in total). See the ISO 32000-1:2008 for PDF especification.
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| Human Rights Declaration PT-BR version | Declaração_Universal_Direitos_Humanos.pdf | Declaração_Universal_Direitos_Humanos.pdf | 48.89 KB | 1.5 |
| One line PDF | PDF_HelloWorld_OneLine_1.5.pdf | PDF_HelloWorld_OneLine_1.5.pdf | 10.39 KB | 1.5 |
| One Page Lorem Ipsum formated article | PDF_LoremIpsum_OnePage_1.5.pdf | PDF_LoremIpsum_OnePage_1.5.pdf | 36.64 KB | 1.5 |
| Two page Lorem Ipsum document | PDF_LoremIpsum_TwoPages_1.4.pdf | PDF_LoremIpsum_TwoPages_1.4.pdf | 44.55 KB | 1.4 |
| Two page Lorem Ipsum document | PDF_LoremIpsum_TwoPages_1.5.pdf | PDF_LoremIpsum_TwoPages_1.5.pdf | 41.93 KB | 1.5 |
The Rich Text Format RTF is a proprietary document file format with published specification developed by Microsoft Corporation, and is used for word processing.
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| DaVinci Resolve 17 Readme File | DavinceResolve17_ReadMe.rtf | DavinceResolve17_ReadMe.rtf | 21.51 KB | 17 |
A text-file is one of the most simple file structures, is structured as a sequence of lines
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| 59641 Digitis of Pi | TXT_DigitsofPi.txt | TXT_DigitsofPi.txt | 58.25 KB | |
| 3330 characters of Lorem Ipsum | TXT_LoremIpsum.txt | TXT_LoremIpsum.txt | 3.25 KB | Latin |
| A Thousand Words List EN-US by Eric Price. Original source avaliable here | TXT_wordlist_ENUS_10000.txt | TXT_wordlist_ENUS_10000.txt | 74.1 KB | EN-US |
DRP stands for DaVinci Resolve Project and is the default project file when exporting projects from the software s database
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| A Sample project file containing few seconds of Color Bar HLG | Sample_Color_Bar_HLG.drp | Sample_Color_Bar_HLG.drp | 27.2 KB | 17.1 |
.epub is a container for digital publications. Widely used for e-book distribution.
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| One page e-book document of the Lorem Ipsum | EPUB_LoremIpsum_OnePage.epub | EPUB_LoremIpsum_OnePage.epub | 3.75 KB |
.mobi is a container for digital publications on the Kindle electronicreader ecosystem
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| Ebook Dracula by Bram Stoker hosted on Project Gutenberg, this version contains no images | Dracula by Bram Stoker(NoImages).url | Dracula by Bram Stoker(NoImages).url | Unavailable | |
| Ebook Dracula by Bram Stoker hosted on Project Gutenberg | Dracula by Bram Stoker(WithImages) copy.url | Dracula by Bram Stoker(WithImages) copy.url | Unavailable |
.inf or Setup Information file is a plain-text file used by Microsoft Windows for the installation of software and drivers.
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
autorun.inf is a common file found in CD-ROMs for describing the procedures to auto launch the CD contents |
autorun.inf | autorun.inf | 41.0 B |
.ini files are used by applications and the Windows operating system for storing initialization parameters. The information is stored in associative arrays, with a key and a value, as such: [section] name=value ; comment text
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| The application information file for the portable version of CPU-Z, distributed by PortableApps.com | appinfo.ini | appinfo.ini | 463.0 B | |
| Google Picasa sample backup configuration file | googlepicasa.picasa.ini | googlepicasa.picasa.ini | 273.0 B | |
| PortableApps Installer license.ini | license.ini | license.ini | 44.0 B | |
| A Install Shield setup file from the LS-USBMX 1/2/3 Steering Wheel W/Vibration driver CD-ROM | setup.ini | setup.ini | 358.0 B | |
A windows .ini file used by the OS to store information about the arrangement of a Windows folder. |
windows-desktop.ini | windows-desktop.ini | 249.0 B |
.mta files are index files created by Samsung Allshare and Samsung Kies to enable navigation of video chapters. The file is XML based and contains thumbnails in base64 format
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
An .mta sample built from a .avi video file by Samsung Kies. Samsung video metadata file generated by SMVideoEngine (Samsung Metadata Video Engine) v1.0, June 2009 |
MOV03439.AVI.mta | MOV03439.AVI.mta | 17.57 KB | 1.0 |
The .pp3file is a text file of associative arrays used to store what edits you made to your photo on the RawTherapee photo editor
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| A Sample file generatd on RawTherapee version 5.8 | IMG_8181.CR2.pp3 | IMG_8181.CR2.pp3 | 12.17 KB | 346 |
A torrent file or meta-info file is a computer file that contains metadata about files and folders to be distributed, and usually also a list of the network locations of trackers, which are computers that help participants in the system find each other and form efficient distribution groups called swarms.[1] A torrent file does not contain the content to be distributed; it only contains information about those files, such as their names, folder structure, and sizes obtained via cryptographic hash values for verifying file integrity. The term torrent may refer either to the metadata file or to the files downloaded, depending on the context. source
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| A Torrent File of the public domain book: Dracula by Stoker, Bram ; Obtained from the University of Toronto - Robarts Library Archive.org website | draculabr00stokuoft_archive.torrent | draculabr00stokuoft_archive.torrent | 30.72 KB |
JPG is the extension used in image files compressed using the JPEG method. The images produced with this method are lossy, there are multiple possible levels of quality, below the list includes some quality options.
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| Blank JPG Color Black, Quality 100%, Size: 1920x1080 | JPG_Black_100_1920x1080.jpg | JPG_Black_100_1920x1080.jpg | 24.01 KB | |
| Blank JPG Color Black, Quality 30%, Size: 1920x1080 | JPG_Black_30_1920x1080.jpg | JPG_Black_30_1920x1080.jpg | 12.23 KB | |
| Blank JPG Color Black, Quality 70%, Size: 1920x1080 | JPG_Black_70_1920x1080.jpg | JPG_Black_70_1920x1080.jpg | 12.23 KB | |
| Blank JPG Color White, Quality 100%, Size: 1920x1080 | JPG_BlankWhite_100_1920x1080.jpg | JPG_BlankWhite_100_1920x1080.jpg | 24.01 KB | |
| Blank JPG Color White, Quality 30%, Size: 1920x1080 | JPG_BlankWhite_30_1920x1080.jpg | JPG_BlankWhite_30_1920x1080.jpg | 12.23 KB | |
| Blank JPG Color White, Quality 70%, Size: 1920x1080 | JPG_BlankWhite_70_1920x1080.jpg | JPG_BlankWhite_70_1920x1080.jpg | 12.23 KB |
The Portable Network Graphics PNG is a raster-graphics file format that supports lossless data compression.
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| Blank PNG Color Black, Size: 1920x1080 | PNG_Black_1920x1080.png | PNG_Black_1920x1080.png | 381.0 B | |
| Blank PNG Color White, Size: 1920x1080 | PNG_Blank_White_1920x1080.png | PNG_Blank_White_1920x1080.png | 381.0 B |
A URL file is a shortcut file referenced by web browsers. It contains a web URL and may also store a reference to the favicon.ico icon file, which is displayed as the icon for the shortcut file. Creting an .url file on Windows is quite simple, simply drag the URL address from your browser window onto your desktop. ( On a Mac, that action will create a weblocfile).
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| A sample shortcut file poiting to www.google.com | google.url | google.url | 55.0 B |
VersionInfo is a text file used by windows 32bit applications that contains version information. This information is language and code page independent. And mostly describes the product, author, release, copyright, iternal names, among many others attributes. The specification is avaliable here
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
| PythonUSBWebServer versioninfo file | VERSIONINFO_PYTHOUSBWEBSERVER | VERSIONINFO_PYTHOUSBWEBSERVER | 1.41 KB | |
| Apache Software Foundation SVN version info file | VERSIONINFO_SVN | VERSIONINFO_SVN | 1.39 KB |
.xmp the so-called sidecars files, are .xml files used by DarkTable a non-destructive image editor, to store information about the images as well as the full editing history without touching the original raw files. For a given source image, multiple editing versions, called duplicates, can co-exist, sharing the same input (raw) data but each having their own metadata, tags and history stack. Each duplicate is represented by a separate XMP sidecar file with a filename constructed in the form _nn..xmp, where nn represents the (minimum two-digit) version number of that edit. Information for the initial edit – the duplicate with version number zero – is stored in the sidecar file ..xmp.
| Description | Link | Name | Size | Version |
|---|---|---|---|---|
A Blank .xmp from a .cr2 raw file containing Adobe Color Presets |
IMG_8366.CR2.xmp | IMG_8366.CR2.xmp | 975.0 B |
