Files

Logfile

Logfile File Connector

The Logfile connector bridges the gap between unstructured text logs and structured analytics. It allows users to query arbitrary log files—such as server logs, application traces, or audit trails—using standard SQL without custom parsers, shell scripts, or manual preprocessing.

At its core, the connector uses a user-defined regular expression (regex) to describe the schema of each record. Every named capture group in the regex becomes a column, and each matching line (or multi-line block) becomes a row. When you query a log file, the connector applies the regex to every entry, extracting structured fields like timestamps, log levels, or message text—transforming what’s often messy, free-form output into data your query engine can reason about.

This connector is designed for both simplicity and precision. With just a few lines of configuration, you can turn any text file into a queryable dataset. Yet, for more complex logs—where patterns vary, messages span multiple lines, or fields appear conditionally—you can fine-tune behavior using a rich set of options that control how Metaform applies the regex, interprets matches, and constructs the resulting schema.


Configuration Options

The simplest possible configuration enables the connector and lets it discover and read any log files within your workspace:

"logfile": {
  "type": "log",
  "extension": "log"
}

Advanced Configuration

You’ll often want greater control—such as defining how log entries are parsed, which file extensions to include, and how extracted fields are typed. All of these advanced options must be set in the connector’s configuration, allowing you to fine-tune how Metaform reads and structures log data without modifying the source files themselves.

The table below summarizes the complete set of available configuration parameters:

OptionDefaultDescription
type(required)Must be logRegex. This specifies that the Logfile connector should interpret each line using a user-defined regular expression.
regex(required)The regular expression that defines how the log file lines will be split. You must enclose the parts of the regex in grouping parentheses that you wish to extract. This plugin uses Java regular expressions, so shortcuts such as \d must be double-escaped (\\d).
extension(required)File extension to be mapped to this configuration. You can only define a single extension.
maxErrors10Specifies the number of errors the reader will ignore before halting execution. Useful for messy or inconsistent log files.
schema(optional)Defines the structure of the log file. If omitted, all fields will be automatically named field_n and assigned a default data type of VARCHAR. See schema field options below for details.

Schema Definition

The schema option defines the structure of the log file as a JSON array of fields. Each field describes one group extracted by your regex.

Schema FieldDefaultDescription
fieldName(required)The field name. Must be a valid Metaform field name and unique within the schema.
fieldTypeVARCHARThe data type of the field. Supported types include VARCHAR, INT, SMALLINT, BIGINT, FLOAT4, FLOAT8, DATE, TIMESTAMP, and TIME.
format(ISO format)Format string for date/time fields. Defaults to ISO format when unspecified.

These options provide the flexibility to handle everything from simple single-line logs to complex multi-line traces with timestamps, error codes, and structured messages—all while maintaining the efficiency and simplicity of SQL-based querying.


Putting It Together

When you configure Metaform’s Logfile connector, think of the log file as a timeline of events—each entry marks a point in time when something occurred. The regex pattern defines how Metaform extracts each component of a log line, while the schema parameter maps those components to typed columns for structured analysis.

Scenario

Imagine your accounting system writes an application log each night while processing ACH payment batches. Each line includes a date, a time broken into hours, minutes, and seconds, and a severity level that indicates the event’s importance.

2025-10-01 23:58:01 INFO  Starting nightly ACH batch process
2025-10-01 23:58:03 INFO  Loaded 1,250 payments from input directory
2025-10-01 23:58:05 WARN  Payment validation warning for Account 23891
2025-10-01 23:58:07 ERROR Failed to transmit batch ID 20251001-01
2025-10-01 23:58:15 INFO  Process completed with 1 error(s)

To convert this unstructured text into structured, queryable data, configure the connector as follows:

{
  "type": "log",
  "extension": "log",
  "regex": "^(\\d{4}-\\d{2}-\\d{2})\\s(\\d{2}):(\\d{2}):(\\d{2})\\s+(INFO|WARN|ERROR)\\s+.*$",
  "maxErrors": 10,
  "schema": [
    { "fieldName": "eventDate", "fieldType": "DATE", "format": "yyyy-MM-dd" },
    { "fieldName": "hour", "fieldType": "INT" },
    { "fieldName": "minute", "fieldType": "INT" },
    { "fieldName": "second", "fieldType": "INT" },
    { "fieldName": "level", "fieldType": "VARCHAR" }
  ]
}

How this works

  • type — Instructs Metaform to use the Logfile connector with regex-based parsing.
  • extension — Applies this configuration to any file ending in .log.
  • regex — Defines how each log line is split into capture groups representing the event date, hour, minute, second, and log level.
  • maxErrors — Allows up to 10 malformed lines before halting execution, useful for inconsistent logs.
  • schema — Names and types each field extracted by the regex, ensuring precise column typing and consistent query results.

Resulting Structure

After parsing, Metaform produces a structured dataset like this:

ColumnTypeExample Value
eventDateDATE2025-10-01
hourINT23
minuteINT58
secondINT07
levelVARCHARERROR

By structuring the log this way, Metaform transforms raw operational data into an analyzable dataset—allowing teams to track error rates, detect recurring transmission failures, and correlate warnings with downstream system events using nothing more than SQL.

Working Example

To explore logfile support in Metaform using this example, you'll need two files:

  • The Logfile connector configuration (.json)
  • A sample logfile (.log) to query

Installing the Connector

The Logfile connector is distributed as a standalone JSON file that registers a new storage plugin with your local Metaform instance. It defines the connector type, supported file extensions, and parsing advanced options.

To install the connector, run the following command:

curl -sSL https://docs.metaform.com/resources/examples/logfile-connector.json | curl -X POST -H "Content-Type: application/json" -d @- http://localhost:8047/storage/logfile.json

This command downloads the connector definition and registers it via Metaform’s REST API. Once installed, the Logfile format will appear in the Storage tab under the name logfile, and will be immediately available for queries—no restart required.

Trying the Example Data

Download the example logfile file here:

https://docs.metaform.com/resources/examples/data.log

The log file contains the following attributes:

  • Timestamped entries written by an overnight ACH batch job
  • Standard log levels (INFO, WARN, ERROR)
  • Occasional multi-line stack traces for transmission errors
  • A completion summary at the end of the run

This layout makes it ideal for testing the connector’s regex and schema parameters.

Save the file to a location accessible to your Metaform instance—for example:

~/data.log

Then, try querying it in the Web user interface:

SELECT
  eventDate, 
  level, 
  COUNT(*) AS count
FROM logfile.`data.log`
GROUP BY eventDate, level
ORDER BY eventDate ASC;

If the connector is installed correctly, Metaform will:

  • Recognize the .log file
  • Infer a clean schema
  • Return a three-column table based on the detected data

We’re actively preparing more detailed documentation and will be adding it here shortly.