Welcome to TileDB Embedded!

What is TileDB Embedded?

TileDB Embedded is a universal storage engine that stores any kind of data (beyond tables) in a powerful unified format, offering extreme interoperability via many APIs and tool integrations.

TileDB Embedded is a powerful engine architected around multi-dimensional arrays that enables storing and accessing:

Dense arrays (e.g., images, video and more)
Sparse arrays (e.g., LiDAR, genomics and more)
Dataframes (any tabular data, as either dense or sparse arrays)
Any data that can be modeled as arrays (e.g., graphs, key-values, ML models, etc.)

You can use TileDB to store data in a variety of applications, such as Genomics, Geospatial, Biomedical Imaging, Finance, Machine Learning, and more. The power of TileDB stems from the fact that any data can be modeled efficiently as either a dense or a sparse multi-dimensional array, which is the format used internally by most data science tooling. By storing your data and metadata in TileDB arrays, you abstract all the data storage and management pains, while efficiently accessing the data with your favorite programming language or data science tool via our numerous APIs and integrations.

TileDB Embedded is a fast embeddable C++ library with the following main features:

Open-source under the MIT license
Fast multi-dimensional slicing via tiling (i.e., chunking)
Multiple compression, encryption and checksum filters
Fast, lock-free ingestion
Parallel IO for both reads and writes
Cloud storage (AWS S3, Google Cloud Storage, Azure Blob Storage)
A fully multi-threaded implementation
Query condition execution push-down
Schema evolution
Data versioning and time traveling
Metadata stored alongside the array data
Groups for hierarchical organization of array data
A growing set of APIs (C, C++, C#, Python, Java, R, Go),
Numerous integrations (Spark, Dask, MariaDB, GDAL, and more)

Code and APIs

The TileDB Embedded engine is built in C++ and exposes a C and a C++ API:

https://github.com/TileDB-Inc/TileDB

We maintain a growing set of language APIs built on top of the C and C++ APIs:

Python: https://github.com/TileDB-Inc/TileDB-Py
R: https://github.com/TileDB-Inc/TileDB-R
Java: https://github.com/TileDB-Inc/TileDB-Java
Go: https://github.com/TileDB-Inc/TileDB-Go
C#: https://github.com/TileDB-Inc/TileDB-CSharp

Integrations and Extensions

We extended TileDB Embedded to capture domain-specific aspects of important use cases:

Population Genomics: An extension for storing and accessing genomic variant (VCF) data
Geospatial: Integrations with PDAL, GDAL, Rasterio and MapServer
Distributed computing: Integration with Spark and Dask
SQL: Integration with MariaDB, Presto and Trino

Getting Started

Our blog post Why Arrays as a Universal Model is a good starting point for understanding why we chose arrays as first-class citizens in TileDB Embedded.

There is a constantly growing set of tutorials in the TUTORIALS page group found in the left navigation menu of these docs.

If you'd like to take a deeper dive into the TileDB Embedded internals, you can check BACKGROUND in the left navigation menu. You can also always consult the HOW TO guides and the API REFERENCE.

Finally, detailed information about the various TileDB Embedded tool integrations and extensions can be found under the INTEGRATIONS & EXTENSIONS page group in the left navigation menu.

How to Use the Docs

To make it easy to understand where to find what you are looking for, the documentation is structured in the following sections:

Tutorials A series of examples for learning how to use TileDB in various use cases
Background Explanation of key topics and concepts
How To Short how-to guides for all different features of TileDB
API Reference
Technical reference to the APIs
Extensions & integrations Detailed documentation on the TileDB Embedded extensions and integrations

History

TileDB started at MIT and Intel Labs as a research project in late 2014 that led to a VLDB 2017 paper. In May 2017 it was spun out into TileDB, Inc., a company that has since raised over $20M to further develop and maintain the project (see Series A announcement).

The company maintains two offerings:

The open-source storage TileDB Embedded engine, which is covered in this documentation.
The commercial data management platform called TileDB Cloud, which builds upon TileDB Embedded and offers data governance, scalable serverless compute and more.

Get Involved

TileDB Embedded along with its APIs and integrations are open-source projects and welcome all forms of contributions. Contributors to the project should read the contribution docs for more information.

We'd love to hear from you. Drop us a line at hello@tiledb.com, join our Slack community, visit our forum or contact form, or follow us on Twitter to stay informed of updates and news.

Other Resources

You can also check out the TileDB blogs and events (webinars and workshops) to learn more about the TileDB vision, value proposition and use cases, as well as meet the team behind all this amazing work.

NextIntroduction

Last updated 6 days ago