Logfile
The Logfile connector bridges the gap between unstructured text logs and structured analytics. It allows users to query arbitrary log files—such as server logs, application traces, or audit trails—using standard SQL without custom parsers, shell scripts, or manual preprocessing.
At its core, the connector uses a user-defined regular expression (regex) to describe the schema of each record. Every named capture group in the regex becomes a column, and each matching line (or multi-line block) becomes a row. When you query a log file, the connector applies the regex to every entry, extracting structured fields like timestamps, log levels, or message text—transforming what’s often messy, free-form output into data your query engine can reason about.
This connector is designed for both simplicity and precision. With just a few lines of configuration, you can turn any text file into a queryable dataset. Yet, for more complex logs—where patterns vary, messages span multiple lines, or fields appear conditionally—you can fine-tune behavior using a rich set of options that control how Metaform applies the regex, interprets matches, and constructs the resulting schema.
Configuration Options
The simplest possible configuration enables the connector and lets it discover and read any log files within your workspace:
"logfile": {
"type": "log",
"extension": "log"
}
Advanced Configuration
You’ll often want greater control—such as defining how log entries are parsed, which file extensions to include, and how extracted fields are typed. All of these advanced options must be set in the connector’s configuration, allowing you to fine-tune how Metaform reads and structures log data without modifying the source files themselves.
The table below summarizes the complete set of available configuration parameters:
| Option | Default | Description |
|---|---|---|
| type | (required) | Must be logRegex. This specifies that the Logfile connector should interpret each line using a user-defined regular expression. |
| regex | (required) | The regular expression that defines how the log file lines will be split. You must enclose the parts of the regex in grouping parentheses that you wish to extract. This plugin uses Java regular expressions, so shortcuts such as \d must be double-escaped (\\d). |
| extension | (required) | File extension to be mapped to this configuration. You can only define a single extension. |
| maxErrors | 10 | Specifies the number of errors the reader will ignore before halting execution. Useful for messy or inconsistent log files. |
| schema | (optional) | Defines the structure of the log file. If omitted, all fields will be automatically named field_n and assigned a default data type of VARCHAR. See schema field options below for details. |
Schema Definition
The schema option defines the structure of the log file as a JSON array of fields. Each field describes one group extracted by your regex.
| Schema Field | Default | Description |
|---|---|---|
| fieldName | (required) | The field name. Must be a valid Metaform field name and unique within the schema. |
| fieldType | VARCHAR | The data type of the field. Supported types include VARCHAR, INT, SMALLINT, BIGINT, FLOAT4, FLOAT8, DATE, TIMESTAMP, and TIME. |
| format | (ISO format) | Format string for date/time fields. Defaults to ISO format when unspecified. |
These options provide the flexibility to handle everything from simple single-line logs to complex multi-line traces with timestamps, error codes, and structured messages—all while maintaining the efficiency and simplicity of SQL-based querying.
Putting It Together
When you configure Metaform’s Logfile connector, think of the log file as a timeline of events—each entry marks a point in time when something occurred. The regex pattern defines how Metaform extracts each component of a log line, while the schema parameter maps those components to typed columns for structured analysis.
Scenario
Imagine your accounting system writes an application log each night while processing ACH payment batches. Each line includes a date, a time broken into hours, minutes, and seconds, and a severity level that indicates the event’s importance.
2025-10-01 23:58:01 INFO Starting nightly ACH batch process
2025-10-01 23:58:03 INFO Loaded 1,250 payments from input directory
2025-10-01 23:58:05 WARN Payment validation warning for Account 23891
2025-10-01 23:58:07 ERROR Failed to transmit batch ID 20251001-01
2025-10-01 23:58:15 INFO Process completed with 1 error(s)
To convert this unstructured text into structured, queryable data, configure the connector as follows:
{
"type": "log",
"extension": "log",
"regex": "^(\\d{4}-\\d{2}-\\d{2})\\s(\\d{2}):(\\d{2}):(\\d{2})\\s+(INFO|WARN|ERROR)\\s+.*$",
"maxErrors": 10,
"schema": [
{ "fieldName": "eventDate", "fieldType": "DATE", "format": "yyyy-MM-dd" },
{ "fieldName": "hour", "fieldType": "INT" },
{ "fieldName": "minute", "fieldType": "INT" },
{ "fieldName": "second", "fieldType": "INT" },
{ "fieldName": "level", "fieldType": "VARCHAR" }
]
}
How this works
type— Instructs Metaform to use the Logfile connector with regex-based parsing.extension— Applies this configuration to any file ending in.log.regex— Defines how each log line is split into capture groups representing the event date, hour, minute, second, and log level.maxErrors— Allows up to 10 malformed lines before halting execution, useful for inconsistent logs.schema— Names and types each field extracted by the regex, ensuring precise column typing and consistent query results.
Resulting Structure
After parsing, Metaform produces a structured dataset like this:
| Column | Type | Example Value |
|---|---|---|
eventDate | DATE | 2025-10-01 |
hour | INT | 23 |
minute | INT | 58 |
second | INT | 07 |
level | VARCHAR | ERROR |
By structuring the log this way, Metaform transforms raw operational data into an analyzable dataset—allowing teams to track error rates, detect recurring transmission failures, and correlate warnings with downstream system events using nothing more than SQL.
Working Example
To explore logfile support in Metaform using this example, you'll need two files:
- The Logfile connector configuration (
.json) - A sample logfile (
.log) to query
Installing the Connector
The Logfile connector is distributed as a standalone JSON file that registers a new storage plugin with your local Metaform instance. It defines the connector type, supported file extensions, and parsing advanced options.
To install the connector, run the following command:
curl -sSL https://docs.metaform.com/resources/examples/logfile-connector.json | curl -X POST -H "Content-Type: application/json" -d @- http://localhost:8047/storage/logfile.json
This command downloads the connector definition and registers it via Metaform’s REST API. Once installed, the Logfile format will appear in the Storage tab under the name logfile, and will be immediately available for queries—no restart required.
Trying the Example Data
Download the example logfile file here:
https://docs.metaform.com/resources/examples/data.log
The log file contains the following attributes:
- Timestamped entries written by an overnight ACH batch job
- Standard log levels (
INFO,WARN,ERROR) - Occasional multi-line stack traces for transmission errors
- A completion summary at the end of the run
This layout makes it ideal for testing the connector’s regex and schema parameters.
Save the file to a location accessible to your Metaform instance—for example:
~/data.log
Then, try querying it in the Web user interface:
SELECT
eventDate,
level,
COUNT(*) AS count
FROM logfile.`data.log`
GROUP BY eventDate, level
ORDER BY eventDate ASC;
If the connector is installed correctly, Metaform will:
- Recognize the
.logfile - Infer a clean schema
- Return a three-column table based on the detected data