FAQs

Frequently Asked Questions

When is Metaform a good choice?

Metaform is well suited for environments wherever data needs to be explored or analyzed without ingestion, transformation pipelines, or external dependencies. It is designed for teams that value transparency, portability, and complete control of their data, allowing them to operate entirely within their own environment. Metaform automatically detects structure, infers types, and exposes data as SQL tables, reducing integration effort and operational complexity.

Specific use cases include:

  • Federated querying across local files, databases, services, and APIs
  • Schema-on-read exploration of unstructured or semi-structured data such as PDFs, logs, and spreadsheets
  • Edge analytics where data must stay local or in secure enclaves
  • Rapid prototyping of data-driven applications without ETL overhead
  • Compliance and audit initiatives requiring traceable, in-place analysis

Metaform can return results from local data in milliseconds and scale from single-user laptops to distributed clusters. However, Metaform is not designed for high-volume transactional workloads or long-running batch processing; it excels where access, flexibility, and autonomy matter most.


How easy is it to get started with Metaform?

You can get started with Metaform in minutes using Docker. Pull the official Metaform image and run it locally to instantly query files, folders, and systems with SQL—no installation, configuration, or database setup required. Metaform’s web console and command-line tools make it easy to connect to your data sources, explore schemas, and run your first queries right away.

For production deployments, Metaform can scale from a single-node instance to a distributed cluster with persistent storage and secure, role-based access.

For detailed setup instructions and examples, see the Quickstart section of the Metaform documentation.


How does Metaform scale?

Metaform scales horizontally with minimal configuration or operational overhead.

At its core, Metaform is a distributed query engine that operates over a virtualized data layer rather than a centralized database. Each connector—whether a file system, API, or database—is treated as a logical source that can be queried independently or joined across other sources in real time. As data volumes grow, Metaform distributes query execution across nodes, automatically parallelizing scans, filters, and aggregations to maintain low-latency performance.

When deployed across multiple machines or containers, Metaform nodes coordinate via lightweight metadata sharing to balance workloads and cache results. Each node can autonomously access local or remote data sources, allowing the system to scale linearly as new nodes are added. This decentralized model eliminates the need for ingestion pipelines or complex replication, keeping infrastructure simple and data analysis fluid—from a single laptop to a complete distributed cluster.


How does Metaform survive failures?

Metaform is designed to survive software and hardware failures—from node restarts to network interruptions—without data loss or manual intervention. Because Metaform performs schema-on-read analytics rather than ingestion, it inherently avoids many of the durability challenges that affect traditional databases. Instead of storing data, Metaform dynamically reconstructs schemas from source files or systems, ensuring that your data remains intact and queryable even if a node or service fails.

Resilient Architecture

Metaform distributes query execution across independent nodes, each capable of accessing its own data sources. If a node fails mid-query, other nodes continue processing unaffected partitions, while the failed operation can be retried automatically once the node rejoins. Metadata about queries, connectors, and configurations can be stored in replicated backends such as PostgreSQL, providing fault-tolerant state management without introducing centralized bottlenecks.

Automatic Recovery

When transient failures occur—such as a container restart or temporary network outage—Metaform automatically retries connections to data sources and resumes query operations once connectivity is restored. For longer-term disruptions, nodes can be safely restarted or replaced without reimporting data or reconfiguring pipelines. Since Metaform’s architecture keeps computation stateless and data externalized, recovery time is effectively limited to the restart time of your infrastructure.

In essence, Metaform achieves fault tolerance not through data replication, but through resilience by design—by leaving your data where it lives, distributing computation, and making every part of the system recover gracefully on its own.


What is Metaform SQL?

At its core, Metaform is a schema-on-read data fabric built to unify files, folders, and systems under one query model—but its external interface is Standard SQL. By using SQL as its universal language, Metaform allows anyone familiar with relational concepts such as schemas, tables, columns, and joins to query raw, unstructured data using tools they already know.

Whether your data comes from a spreadsheet, PDF, ACH file, directory listing, or API, Metaform exposes it as a virtual table that can be filtered, joined, and aggregated like any other dataset. This approach lets you run powerful queries—such as joining a CSV file with a live REST endpoint or filtering log data with regex—without any ingestion or transformation steps.

Because Metaform extends SQL, it inherits compatibility with standard SQL dialects and client tools. You can connect Metaform to BI platforms, JDBC drivers, or command-line clients using standard SQL connections and immediately start exploring your data.

We’re actively preparing more detailed documentation and will be adding it here shortly.