The amount of genomics data being generated is growing exponentially. Genomics data typically appear in a plethora of different file formats, FastQ, BAM, VCF, and PLINK to name a few. All these formats share a common characteristic: they all record information about a particular individual (or a collection of individuals) at particular positions in the genome.
Collected genomics data from multiple individuals can be modeled as an enormous matrix, where the rows of the matrix represent individuals (or samples) and the columns represent genomic positions. The human genome consists of 23 chromosomes which collectively account for approx. 3B positions. TileDB can efficiently store this information as dense or sparse 2D arrays (depending on the type of data) and massively compress it by exploiting tiling. Improved data locality allows for rapid access from the compressed array.
Medical imagery is another ideal application for TileDB, as such data can be naturally represented as 2D or 3D arrays, which can be either dense or sparse depending on the use case. We are currently exploring ingesting MRI and CAT scan medical images.
Geospatial data are perfectly modeled as sparse 2D or 3D arrays. We are currently working on ingesting LAS/LAZ files that store Lidar data.
Financial time series are essentially sparse or dense vectors, which can be grouped into dense or sparse matrices, effectively stored and compressed in TileDB by mechanisms such as double-delta compression.
Social graphs can be mathematically represented as adjacency matrices. These are typical enormous, sparse 2D arrays, where the rows and columns are the nodes of the social graph, and the array elements indicate a relationship between two nodes. TileDB provides an excellent means to efficiently store and retrieve graph data through its sparse array support.
We are currently building ingestion tools for various data formats for the above applications. Please contact us for more information in case you are interested in a particular application/format or you have suggestions about an application area where you think TileDB may be advantageous.