Welcome to the TileDB docs!

Here you will find a comprehensive description of all the TileDB features, installation guides, tutorials, and API references. You will also be able to suggest edits and submit your questions. Enjoy!

Get Started    

Novel Format

TileDB introduces a novel on-disk format for storing multi-dimensional arrays. Contrary to other popular systems (e.g., HDF5) that are optimized mostly for dense arrays, TileDB is optimized for both dense and sparse arrays, exposing a unified array API. In addition, the TileDB format allows for efficient data ingestion and updates.

Integration

TileDB is a library exposing a C API, which makes it very easy to integrate with popular higher-level programming languages (e.g., R, Python, Matlab, Java, etc.) and data science tools (e.g., NumPy, Pandas, Spark, etc.).

Parallelism

TileDB is thread- and process-safe, allowing users to build powerful parallel computational engines on top of the TileDB array storage, either with multithreading or multiprocessing (e.g., using OpenMP / MPI). In addition, TileDB supports asynchronous writes and reads, enabling users to overlap IO with CPU intensive operation boosting performance.

Performance

In addition to the effective data format, TileDB is written in C/C++ incorporating many low-level optimizations for achieving IO efficiency and a small main-memory footprint. The VLDB 2017 research paper demonstrates the performance superiority of TileDB against competing solutions for array storage operations.

Compression

TileDB can compress array data with a wide number of compressors, such as GZIP, BZIP2, LZ4, ZStandard, Blosc, double-delta and run-length encoding. TileDB groups array elements (cells) in tiles, which are the atomic unit of compression and IO. This enables fast slicing and dicing of arrays while achieving high compression ratios. TileDB can be easily extended to support more compression mechanisms.

Multiple Backends

TileDB is constantly being optimized for a wide range of storage backends in addition to local filesystems, such as Hadoop File System (HDFS), S3 object storage, Google File System (GFS), and more. TileDB abstracts the array storage layer, offering to the user a unified global view of their arrays that is agnostic to the actual storage backend.

Features