Data Streams | Tom Sawyer Software

Tom Sawyer Data Streams is a schema-driven platform that unifies Kafka streams and other heterogeneous sources into a governed, query-ready knowledge graph you own. Visually design data flows, declare schemas, and apply precise transformations and filters with Spring Expression Language (SpEL). Continuously run your data flow to normalize, enrich, and link changing streams in real time, then persist the knowledge graph model in a graph database for scalable sharing and downstream analysis.

Tom Sawyer Data Streams reduces integration efforts across legacy and streaming systems, fits existing pipelines and tools, and delivers a complete, accurate picture for lineage, impact analysis, and operational decisions.

Watch this short introduction to Data Streams for unifying data into a governed, real-time knowledge graph.

Data Streams 1.0 Beta Coming Soon

Contact us to join the beta and be the first to get early access to Data Streams. Your valuable input will help shape future releases, and our team of graph gurus can provide expert advice and guidance to help accelerate your data transformation objectives.

Designed for enterprise data architects

Enterprises face fragmented sources, legacy systems, and real-time demands. Data Streams supports a range of high-impact scenarios for data architects and enterprise teams, from migration and continuous sync to governance and risk. Discover how Data Streams addresses common challenges with repeatable patterns and proven tooling.

One-Time Data Migration

Migrate legacy relational models to a governed, query-ready knowledge graph. Introspect the source schema, design visual flows to rename, merge or split tables, convert properties into edges, and execute once to materialize the model in a graph database.

Continuous Sync / Change Data Capture (CDC) to Graph

Keep operational systems and streams in lockstep with an always-current knowledge graph. Ingest Kafka topics and database changes, apply SpEL transformations and filters, and continuously normalize, enrich, and link records as they arrive.

Entity Resolution and Master Data

Create golden entities across silos to power consistent analytics and operations. Normalize identifiers, apply matching rules in SpEL, and persist deduplicated nodes and relationships for trusted 360° views.

Fraud and Risk Signal Fusion

Fuse transactions, accounts, devices, and geolocation to surface suspicious patterns in real time. Materialize relationship signals like shared devices or rings, enrich events in flow, and route high-confidence alerts downstream.

Security and Identity Graph

Unify IAM, directory, endpoint, and network telemetry to understand access paths and blast radius. Model entitlements and dependencies, detect risky relationships, and support least-privilege and incident response with graph context.

Data Catalogue, Lineage, and Governance

Consolidate schemas, datasets, policies, and lineage edges to accelerate impact analysis and compliance. Track how data moves, who uses it, and which rules apply, with the graph persisted in a graph database for broad sharing and auditability.

Real-time data integration with Apache Kafka and Confluent

Data Streams subscribes to Apache Kafka (or Confluent) topics you provision, then transforms and persists those events into your knowledge graph model. Use your tool of choice, such as Kafka Connect, Change Data Capture tools, or custom producers, to publish from you system of record and subscribe to the topic in Data Streams. You control which topics represent nodes, edges, or attributes, and you can secure access with your existing Kafka authentication and encryption settings.

With Kafka, you can create topics from almost any system and feed them into Data Streams, including:

Relational databases - PostgreSQL, MySQL, SQL Server, Oracle.
NoSQL/search - MongoDB, Cassandra, Elasticsearch.
Graph/datastores - Neo4j, JanusGraph, RDF/SPARQL endpoints.
Warehouses - Snowflake, BigQuery, Redshift.
Files/object storage - CSV/JSON from S3, Azure Blob, GCS.
Applications/APIs - Salesforce, ServiceNow, Jira, SAP, Dynamics.
Log/IoT/event sources - syslog, MQTT, sensors, webhooks.

If it can publish to Kafka, Data Streams can consume it in beta—and apply your schema, transformations (SpEL), and real-time flow execution.

Create Kafka topics from your data sources for easy integration and transformation in Data Streams.

Getting started

Connecting to your topics and building your knowledge graph model is made simple with Data Streams. Simply connect to your streams and apply transformations using the visual data flow editor.

View the resulting schema in an easily understandable graph drawing or tree view.

1. Connect to Kafka Topics

Enter the connection details to connect to the Kafka topics of interest.

2. Apply Transformations

Transform topics to convert them from nodes to edges. Rename topics for consistency and clarity.

3. Visualize and Validate the Schema

View the resulting schema as you work. See a graph of the schema or tree view.

4. Save and Deploy

Save your knowledge graph model to a JanusGraph Sink and deploy the stream for continuous execution.

Effortlessly apply transformations to topics

Transformations turn raw stream events into a coherent, query-ready model. Incoming topics enter the flow as nodes; you then promote the right connections by converting selected nodes into edges and standardize naming to match your domain vocabulary. The result is a knowledge graph model that reflects how things actually relate, not just how they were recorded upstream.

For technical teams, this approach reduces custom Extract, Transform, Load operations and enforces consistency without heavy refactoring. You define intent visually, validate the evolving structure against your schema, and keep semantics stable across sources. Expressions (SpEL) give you precise, lightweight control when you need it, while keeping the overall flow simple to reason about and easy to review.

Building the knowledge graph model you wish you had is easy with Data Streams transformations.

Now that you have your knowledge graph model, what next?

Once your flows are running, the resulting knowledge graph lives in your environment and fits into your existing data environment. Persisted in a graph database, the model plugs into your existing pipelines, catalogs, governance, and security policies—query it directly, push extracts to warehouses, trigger downstream jobs, and integrate with lineage, MDM, or observability tools. Because the model and data are yours, you can evolve schemas, add sources, and version changes without vendor lock-in.

In the beta version, Data Streams stores your model in a JanusGraph Sink. Support for additional graph databases is planned in future releases.

For added value, pair Data Streams with Tom Sawyer Software’s visualization products to explore, analyze and communicate the value of the graph.

Tom Sawyer Explorations

Explorations graph intelligence application empowers analysts to rapidly uncover insights through data integration, graph pattern matching, and advanced graph visualizations and analysis—all without any coding and without having to know query languages.

2025.07.23.0.TSE.Annotations

Tom Sawyer Perspectives

Perspectives is a powerful, low-code development platform for building standalone applications or embedding advanced data analysis and visualization into existing systems. Use advanced layouts, styling, and UX components to build custom apps and dashboards on top of your graph model.

www.tomsawyer.comhubfsWebinars2023.05.08.0.MicrowaveDashboard-1

The benefits of Data Streams are clear

Data Streams helps teams break through data silos so they can see the full picture. It provides a practical path to build a usable knowledge graph from disparate systems and streams, and to turn that graph into real-time insight for operations and analysis.

Unified View of Critical Data

Consolidate relational, NoSQL, files, APIs, and event streams via Kafka into one consistent, query-ready model.

Faster Time to Insight

Visual flow design and expressive transformations reduce custom ETL work and shorten integration cycles.

Real-Time Processing

Execute flows continuously to normalize, enrich, and link events as they arrive for timely decisions.

Higher Data Quality and Consistency

Schema-driven mapping, validation, and filtering produce cleaner entities and relationships.

Context-Rich Analytics

Model relationships for impact analysis, lineage tracing, and root-cause investigation.

Governance and Auditability

Persist the graph in JanusGraph for shared access, versionable schemas, and clear provenance.

Ownership and Portability

Your model and data live in your database, ready for use with existing pipelines, tools, and policies.

Secure Integration

Works with your Kafka authentication and encryption settings.

Incremental Adoption

Start with a single migration or CDC flow and expand as value is proven across teams.