Skip to content

Latest commit

 

History

History
357 lines (218 loc) · 27.6 KB

File metadata and controls

357 lines (218 loc) · 27.6 KB

Files Examples

gimmick:ForkMeOnGitHub (position: 'right', color: 'darkblue')

Files Examples Horizontal

Note: This is the file index containing the complete dataset of file examples. The dataset is also avaliable as a JSON file here

Index of File Examples

This repository can be defined as:

  • A collection of file examples of different formats.
  • Samples of files and structures for everyday use.
  • A compendium of links of sample files throughout the internet.

The general ideia, is to provide an index of materials for those situations in software development or design where you might need to do some unit testing with real world files.

Below the files are listed by category and type/context.

Archive File Format

7Z

7z is a compressed archive file format that supports several different data compression, encryption and pre-processing algorithms. The 7z format initially appeared as implemented by the 7-Zip archiver. The 7-Zip program is publicly available under the terms of the GNU Lesser General Public License. See more at 7z

Description Link Name Size Version
A 7z archive of the Wikipedia.org homepage HTML file WikipediaHomePage.7z WikipediaHomePage.7z 52.47 KB

GZ

gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems. See more at gzip

Description Link Name Size Version
A gzip archive of the HTML version of the page Wikipedia.org Wikipedia.html.gz Wikipedia.html.gz 56.4 KB

ISO

An optical disc image (or ISO image, from the ISO 9660 file system used with CD-ROM media) is a disk image that contains everything that would be written to an optical disc, disk sector by disc sector, including the optical disc file system. See more at ISO

Description Link Name Size Version
An iso image of a CD-ROM containing a copy of wikipedi.org HTML page inside one folder wikipedia-org-one-page.iso wikipedia-org-one-page.iso 216.0 KB

TAR

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes A tar archive consists of a series of file objects, each file object includes any file data, and is preceded by a 512-byte header record. The file data is written unaltered except that its length is rounded up to a multiple of 512 bytes. See more at tar

Description Link Name Size Version
A tar archive containing two files: the HTML versions of the Wikipedia.org and the page www.isitchristmas.com Wikipedia.tar Wikipedia.tar 168.0 KB

WARC

This format is used especifically for archiving web-crawls, and is a revision of the Internet Archives ARC File Format, specifying a method for combining multiple digital resources into an aggregate archive file together with related information.

Description Link Name Size Version
One page WARC archive of the Wikipedia.org homepage WikipediaOrg-20201212031238412.warc WikipediaOrg-20201212031238412.warc 421.33 KB

ZIM

The ZIM file format is an open file format that stores wiki content for offline usage. Its primary focus is the contents of Wikipedia and other Wikimedia projects. The format allows for the compression of articles, features a full-text search index and native category and image handling similar to MediaWiki, and the entire file is easily indexable and readable using a program like Kiwix – unlike native Wikipedia XML database dumps. (source) the Kwix open source project offer a collection of ZIM archive here

Description Link Name Size Version
A zim archive containing every book from the english Open Source Collection of the Gutenberg Project (hosted by Kiwix) gutenberg_en_all_2020-12.zim.url gutenberg_en_all_2020-12.zim.url 61GB 2020-12
TOP 100 Articles from Wikipedia EN-US zim file, hosted by Kiwix wikipedia_en_100_2020-10.zim.url wikipedia_en_100_2020-10.zim.url 304M 2020-10
Every page and picture from Wikipedia EN-US hosted by Kiwix wikipedia_en_all_maxi_2020-11.zim.url wikipedia_en_all_maxi_2020-11.zim.url 94GB 2020-11
A zim archive containing every page from Wikipedia EN-US without pictures (hosted by Kiwix) wikipedia_en_all_nopic_2020-10.zim.url wikipedia_en_all_nopic_2020-10.zim.url 39GB 2020-10

ZIP

ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.

Description Link Name Size Version
A zip archive of the HTML version of the page Wikipedia.org WikipediaOrg.zip WikipediaOrg.zip 56.52 KB ZIP

Programming Languages

MAKEFILE | AM

Makefile.am is a programmer-defined file and is used by automake to generate the Makefile.in file (the .am stands for automake). GNU Automake is a tool for automatically generating Makefile.in files compliant with the GNU Coding Standards (See more). This software is part of Autoconf wich is an extensible package of M4 macros that produce shell scripts to automatically configure software source code packages. These scripts can adapt the packages to many kinds of UNIX-like systems without manual user intervention. Autoconf creates a configuration script for a package from a template file that lists the operating system features that the package can use, in the form of M4 macro calls.

Description Link Name Size Version
Makefile for HTTrack Makefile.am Makefile.am 196.0 B

C

C is the default extension for the [C Programming Language](https://en.wikipedia.org/wiki/C_(programming_language). These are text files, that later are compiled into machine code by the C compiler.

Description Link Name Size Version
HTTrack library example .c file, distributed under the GNU General Public License, Copyright (C) Xavier Roche and other contributors example.c example.c 7.65 KB

PHP

Description Link Name Size Version
A simple PHP file that outputs a option box PHP-OptionSelect.php PHP-OptionSelect.php 327.0 B

Video

BIK | BINK

Bink Video is a proprietary file format (extensions .bik and .bk2) for video developed by RAD Game Tools. It has been primarily used for full-motion video sequences in video games, and has been used in games for Windows, Mac OS, Xbox 360, Xbox, GameCube, Wii, PlayStation 3, PlayStation 2, Dreamcast, Nintendo DS, and PSP. See more at Bink Video

Description Link Name Size Version
The Sierra Trademark presentation from a classic 2000's CD-ROM Sierra_Logo.bik Sierra_Logo.bik 3.91 MB

MP4

MPEG-4 Part 14 or MP4 is a digital multimedia container format most commonly used to store video and audio, but it can also be used to store other data such as subtitles and still images. See more here

Description Link Name Size Version
The sample introduction video from the Nextcloud opensource platform] [Nextcloud intro.mp4](https://raw.githubusercontent.com/thethales/File-Examples/main//file-examples/MP4/Nextcloud intro.mp4) Nextcloud intro.mp4 3.78 MB

3D Modeling

BLEND | Blender

.blend is the dafult file system for Blender and can pack multiple scenes into a single file. The best place to find blender sample files is at blender.org/download/demo-files/ though they are offered in .zip containers. Below some independent samples are listed.

Description Link Name Size Version
A simple 535kb file containing the default blender objet, the cube [cube.blend](https://raw.githubusercontent.com/thethales/File-Examples/main//file-examples/BLEND/cube.blend copy.backup) cube.blend 135.0 B 2.82
A simple 535kb file containing the default blender objet, the cube cube.blend cube.blend 544.47 KB 2.82

FBX

.fbx (Filmbox) is a proprietary file format owned by Autodesk since 2006. It is currently one of the main 3D exchange formats as used by many 3D tools¹ ². FBX has a text based (ascii) and a binary version. There's no known public documentation avaliable, notes on the innerworkings of the format are provided in the following links:

Description Link Name Size Version
A simple 3D cube exported as FBX by Blender 2.82 cube.fbx cube.fbx 25.7 KB Kaydara FBX Binary

X3D

X3D is a royalty-free ISO/IEC standard for declaratively representing 3D computer graphics. File format support includes XML, ClassicVRML, Compressed Binary Encoding (CBE) and a draft JSON encoding. See more at x3d

Description Link Name Size Version
A simple file containing the default blender objet, the cube, exported on Blender 2.92 cube.x3d cube.x3d 3.59 KB

Documents

DOC

.doc and .docx are the formats of files created by Microsoft Word Mime Types: application/doc application/ms-doc application/msword

Description Link Name Size Version
One Page Document of Lorem Ipsum DOC_LoremIpsum_OnePage.docx DOC_LoremIpsum_OnePage.docx 8.81 KB

GEDCOM

A plain text file containing genealogical information about individuals, and metadata linking these records together. This data model is based on the nuclear family and the individual. This contrasts with evidence-based models, where data is structured to reflect the supporting evidence. In the GEDCOM lineage-linked data model, all data is structured to reflect the believed reality, that is, actual (or hypothesized) nuclear families and individuals. Source

Description Link Name Size Version
A Sample file generated on Ancestry.com containing a simple family structure. [Surname Family Tree.ged](https://raw.githubusercontent.com/thethales/File-Examples/main//file-examples/GEDCOM/Surname Family Tree.ged) Surname Family Tree.ged 979.0 B 5.5

HTML

.html is the extension for Hypertext Markup Language files, wich are the standard markup language for documents designed to be displayed in a web browser. Every web page on the web build based on html

Description Link Name Size Version
A simple HTML page listing some Lorem Ipsum paragraphs HTMLLoremIpsumOnePage.html HTMLLoremIpsumOnePage.html 10.72 KB
A slimmed version of the sample Lorem Ipsum HTML page. HTMLLoremIpsumOnePage.min.html HTMLLoremIpsumOnePage.min.html 6.83 KB

PDF

The Portable Document Format (PDF) is a file format developed to present documents independent of hardware, software and operating system. This format is widely used and has several versions (as of Oct. 2020, 7 revisions in total). See the ISO 32000-1:2008 for PDF especification.

Description Link Name Size Version
Human Rights Declaration PT-BR version Declaração_Universal_Direitos_Humanos.pdf Declaração_Universal_Direitos_Humanos.pdf 48.89 KB 1.5
One line PDF PDF_HelloWorld_OneLine_1.5.pdf PDF_HelloWorld_OneLine_1.5.pdf 10.39 KB 1.5
One Page Lorem Ipsum formated article PDF_LoremIpsum_OnePage_1.5.pdf PDF_LoremIpsum_OnePage_1.5.pdf 36.64 KB 1.5
Two page Lorem Ipsum document PDF_LoremIpsum_TwoPages_1.4.pdf PDF_LoremIpsum_TwoPages_1.4.pdf 44.55 KB 1.4
Two page Lorem Ipsum document PDF_LoremIpsum_TwoPages_1.5.pdf PDF_LoremIpsum_TwoPages_1.5.pdf 41.93 KB 1.5

RTF

The Rich Text Format RTF is a proprietary document file format with published specification developed by Microsoft Corporation, and is used for word processing.

Description Link Name Size Version
DaVinci Resolve 17 Readme File DavinceResolve17_ReadMe.rtf DavinceResolve17_ReadMe.rtf 21.51 KB 17

TXT

A text-file is one of the most simple file structures, is structured as a sequence of lines

Description Link Name Size Version
59641 Digitis of Pi TXT_DigitsofPi.txt TXT_DigitsofPi.txt 58.25 KB
3330 characters of Lorem Ipsum TXT_LoremIpsum.txt TXT_LoremIpsum.txt 3.25 KB Latin
A Thousand Words List EN-US by Eric Price. Original source avaliable here TXT_wordlist_ENUS_10000.txt TXT_wordlist_ENUS_10000.txt 74.1 KB EN-US

Unspecified

DRP

DRP stands for DaVinci Resolve Project and is the default project file when exporting projects from the software s database

Description Link Name Size Version
A Sample project file containing few seconds of Color Bar HLG Sample_Color_Bar_HLG.drp Sample_Color_Bar_HLG.drp 27.2 KB 17.1

Ebook

EPUB

.epub is a container for digital publications. Widely used for e-book distribution.

Description Link Name Size Version
One page e-book document of the Lorem Ipsum EPUB_LoremIpsum_OnePage.epub EPUB_LoremIpsum_OnePage.epub 3.75 KB

MOBI

.mobi is a container for digital publications on the Kindle electronicreader ecosystem

Description Link Name Size Version
Ebook Dracula by Bram Stoker hosted on Project Gutenberg, this version contains no images Dracula by Bram Stoker(NoImages).url Dracula by Bram Stoker(NoImages).url Unavailable
Ebook Dracula by Bram Stoker hosted on Project Gutenberg Dracula by Bram Stoker(WithImages) copy.url Dracula by Bram Stoker(WithImages) copy.url Unavailable

Configuration Files

INF

.inf or Setup Information file is a plain-text file used by Microsoft Windows for the installation of software and drivers.

Description Link Name Size Version
autorun.inf is a common file found in CD-ROMs for describing the procedures to auto launch the CD contents autorun.inf autorun.inf 41.0 B

INI

.ini files are used by applications and the Windows operating system for storing initialization parameters. The information is stored in associative arrays, with a key and a value, as such: [section] name=value ; comment text

Description Link Name Size Version
The application information file for the portable version of CPU-Z, distributed by PortableApps.com appinfo.ini appinfo.ini 463.0 B
Google Picasa sample backup configuration file googlepicasa.picasa.ini googlepicasa.picasa.ini 273.0 B
PortableApps Installer license.ini license.ini license.ini 44.0 B
A Install Shield setup file from the LS-USBMX 1/2/3 Steering Wheel W/Vibration driver CD-ROM setup.ini setup.ini 358.0 B
A windows .ini file used by the OS to store information about the arrangement of a Windows folder. windows-desktop.ini windows-desktop.ini 249.0 B

MTA

.mta files are index files created by Samsung Allshare and Samsung Kies to enable navigation of video chapters. The file is XML based and contains thumbnails in base64 format

Description Link Name Size Version
An .mta sample built from a .avi video file by Samsung Kies. Samsung video metadata file generated by SMVideoEngine (Samsung Metadata Video Engine) v1.0, June 2009 MOV03439.AVI.mta MOV03439.AVI.mta 17.57 KB 1.0

PP3 | RawTherapee

The .pp3file is a text file of associative arrays used to store what edits you made to your photo on the RawTherapee photo editor

Description Link Name Size Version
A Sample file generatd on RawTherapee version 5.8 IMG_8181.CR2.pp3 IMG_8181.CR2.pp3 12.17 KB 346

TORRENT

A torrent file or meta-info file is a computer file that contains metadata about files and folders to be distributed, and usually also a list of the network locations of trackers, which are computers that help participants in the system find each other and form efficient distribution groups called swarms.[1] A torrent file does not contain the content to be distributed; it only contains information about those files, such as their names, folder structure, and sizes obtained via cryptographic hash values for verifying file integrity. The term torrent may refer either to the metadata file or to the files downloaded, depending on the context. source

Description Link Name Size Version
A Torrent File of the public domain book: Dracula by Stoker, Bram ; Obtained from the University of Toronto - Robarts Library Archive.org website draculabr00stokuoft_archive.torrent draculabr00stokuoft_archive.torrent 30.72 KB

Images

JPG

JPG is the extension used in image files compressed using the JPEG method. The images produced with this method are lossy, there are multiple possible levels of quality, below the list includes some quality options.

Description Link Name Size Version
Blank JPG Color Black, Quality 100%, Size: 1920x1080 JPG_Black_100_1920x1080.jpg JPG_Black_100_1920x1080.jpg 24.01 KB
Blank JPG Color Black, Quality 30%, Size: 1920x1080 JPG_Black_30_1920x1080.jpg JPG_Black_30_1920x1080.jpg 12.23 KB
Blank JPG Color Black, Quality 70%, Size: 1920x1080 JPG_Black_70_1920x1080.jpg JPG_Black_70_1920x1080.jpg 12.23 KB
Blank JPG Color White, Quality 100%, Size: 1920x1080 JPG_BlankWhite_100_1920x1080.jpg JPG_BlankWhite_100_1920x1080.jpg 24.01 KB
Blank JPG Color White, Quality 30%, Size: 1920x1080 JPG_BlankWhite_30_1920x1080.jpg JPG_BlankWhite_30_1920x1080.jpg 12.23 KB
Blank JPG Color White, Quality 70%, Size: 1920x1080 JPG_BlankWhite_70_1920x1080.jpg JPG_BlankWhite_70_1920x1080.jpg 12.23 KB

PNG

The Portable Network Graphics PNG is a raster-graphics file format that supports lossless data compression.

Description Link Name Size Version
Blank PNG Color Black, Size: 1920x1080 PNG_Black_1920x1080.png PNG_Black_1920x1080.png 381.0 B
Blank PNG Color White, Size: 1920x1080 PNG_Blank_White_1920x1080.png PNG_Blank_White_1920x1080.png 381.0 B

Shortcut

URL

A URL file is a shortcut file referenced by web browsers. It contains a web URL and may also store a reference to the favicon.ico icon file, which is displayed as the icon for the shortcut file. Creting an .url file on Windows is quite simple, simply drag the URL address from your browser window onto your desktop. ( On a Mac, that action will create a weblocfile).

Description Link Name Size Version
A sample shortcut file poiting to www.google.com google.url google.url 55.0 B

WINDOWS

VersionInfo

VersionInfo is a text file used by windows 32bit applications that contains version information. This information is language and code page independent. And mostly describes the product, author, release, copyright, iternal names, among many others attributes. The specification is avaliable here

Description Link Name Size Version
PythonUSBWebServer versioninfo file VERSIONINFO_PYTHOUSBWEBSERVER VERSIONINFO_PYTHOUSBWEBSERVER 1.41 KB
Apache Software Foundation SVN version info file VERSIONINFO_SVN VERSIONINFO_SVN 1.39 KB

XML

XMP | Darktable

.xmp the so-called sidecars files, are .xml files used by DarkTable a non-destructive image editor, to store information about the images as well as the full editing history without touching the original raw files. For a given source image, multiple editing versions, called duplicates, can co-exist, sharing the same input (raw) data but each having their own metadata, tags and history stack. Each duplicate is represented by a separate XMP sidecar file with a filename constructed in the form _nn..xmp, where nn represents the (minimum two-digit) version number of that edit. Information for the initial edit – the duplicate with version number zero – is stored in the sidecar file ..xmp.

Description Link Name Size Version
A Blank .xmp from a .cr2 raw file containing Adobe Color Presets IMG_8366.CR2.xmp IMG_8366.CR2.xmp 975.0 B