Colin R. Moran

Software Engineer · Distributed Systems & Data Platforms

I design and build resilient systems, data pipelines, and tools that make debugging and scaling less painful.

Expertise

Building systems that scale

From distributed data platforms to cloud migrations, I focus on reliability, observability, and team growth.

10+

Years Experience

Building and delivering features for real customers.

Cloud Migrations

On-prem to cloud with Kubernetes with serverless adoption.

100+

Engineering Interviews

Team development and mentorship.

Experience

What I've delivered

Roles where I owned systems, shipped improvements, and handled real-world constraints.

Software Engineer - Axon

Remote · Marysville, OH

2022 – Present

Raised code coverage to ~80% across four reporter services, added metrics & monitoring to reduce mean-time-to-discovery
Reworked report generation flow to be more efficient and maintainable while improving data consistency
Implemented indexing for new entity types to reduce SQL Server load
Produced architecture diagrams to clarify system design and improve cross-team alignment
Conducted 100+ software engineering interviews and mentored junior/mid-level developers
Participated in on-call rotation to monitor and troubleshoot critical production incidents

Enterprise Architect - CAS

Columbus, OH

2021 – 2022

Led selection of an enterprise DSML platform using frameworks, scorecards, and reference architectures
Designed a scalable bipartite graph, fuzzy-matching solution for citation matching using Apache Flink
Helped refine architecture processes and improved transparency and alignment with stakeholders
Started a machine learning community of practice and established ML ops best practices

TAP Engineer - CAS

Columbus, OH

2019 – 2021

Led migration of on-prem applications to AWS, including breaking down work and collaborating with AWS experts
Drove adoption of AWS-native serverless and Kubernetes solutions to reduce operational costs
Created SDLC best practices for public cloud and standard operating procedures for major projects
Led efforts around cloud security and Kubernetes migration strategies

Software Engineer - CAS

Columbus, OH

2018 – 2019

Developed Java applications and debugged complex production issues
Improved document workflows using ML and automated manual unlocks, saving hundreds of hours

Systems I've Built

Real systems, services, and infrastructure

A sampling of the distributed systems, pipelines, and tools I've designed or implemented.

Start here

Reporting Platform

Incremental refactoring of production report generation services to improve performance, data consistency, and observability while maintaining system availability.

Problem

Report generation services lacked consistency, had poor observability, and struggled with SQL Server load as data volumes grew.

Outcomes

~40% SQL Server load reduction
~80% code coverage across services
Improved MTTR from hours to minutes

Technologies

ScalaKubernetesMSSQLPrometheusGrafana

Learn more

Production Platforms

Enterprise DSML Platform Selection & Architecture

AWSKubernetesMLOpsPythonApache Spark

Context

Organization needed a scalable, production-ready platform for data science and machine learning workloads with proper MLOps practices.

Multimodal Vector Search & Retrieval Platform

RustScalagRPCConsistent HashingPrometheus

Context

Built a high-performance vector search system supporting multiple data modalities with horizontal scaling and consistent hashing.

Data & Processing Pipelines

Fuzzy Citation Matching Pipeline

Apache FlinkScalaGraph AlgorithmsFuzzy Matching

Context

Needed to match citations across large document sets with high accuracy, handling variations in formatting, abbreviations, and partial matches.

Libraries & Tooling

Cypher-to-Protobuf Translation Library

RustTree-sitterProtobufCypherGraph Databases

Context

Needed to parse Cypher graph queries into structured schemas for graph backends, enabling type-safe query construction.

Architecture Stories

How I think about tradeoffs

Design narratives behind key systems: constraints, options considered, and final choices.

Deep Debugging: Java Heap Dumps

Software Engineer · Platform / Observability · 2025

Deep‑dive into JVM heap‑dump debugging, showing how to capture and analyze dumps both locally and from Java services running in Kubernetes, to diagnose memory leaks and production OOM incidents with high confidence.

Impact

Reduced MTTR for Java memory incidents by standardizing heap‑dump workflows for local and Kubernetes‑deployed applications.
Improved accuracy of leak detection using structured heap‑dump analysis (dominator trees, retained‑size, and leak‑suspect reports).
Enabled safe, repeatable capture of large heap dumps from production pods via kubectl, jcmd/jmap, and OOME‑triggered dumps.

JavaJVMHeap DumpsKubernetesObservabilityPerformance Debugging

Read case study

Report Generation: Performance & Consistency

Software Engineer · Axon · 2022–2025

Incremental refactoring of report generation services to improve performance, data consistency, and observability while maintaining system availability.

Impact

Reduced SQL Server load by roughly 40% through query optimization, caching, and incremental materialization of high‑traffic reports.
Increased automated test coverage to around 80%, improving confidence in refactors and reducing regressions in downstream reporting flows.
Improved MTTR from hours to minutes by adding structured logging, dashboards, and alerting tied directly to report SLIs and error budgets.

ScalaKubernetesSQL ServerPrometheusGrafanaObservability

Read case study

ML Platform: Selection & Architecture

Enterprise Architect · CAS · 2021–2022

Selection and rollout of a production‑ready data science and machine learning platform, establishing MLOps best practices and reference architectures.

Impact

Accelerated ML project delivery by providing a standardized platform for experiment tracking, model training, and repeatable deployments to production.
Standardized MLOps practices around CI/CD, feature stores, model versioning, and monitoring, reducing operational friction for data scientists and engineers.
Established an ML community of practice that aligned teams on shared patterns, reusable components, and governance for responsible model usage.

AWSKubernetesMLOpsPythonApache SparkArchitecture

Cloud Migration: On-Prem to AWS

TAP Engineer · CAS · 2019–2021

Migration of on-premises applications to AWS, driving adoption of serverless and Kubernetes solutions to reduce operational costs and improve scalability.

Impact

Reduced operational costs by consolidating legacy on‑prem workloads into managed AWS services, decommissioning physical infrastructure, and optimizing resource utilization across environments.
Improved scalability and reliability by re‑platforming critical applications onto Kubernetes and serverless architectures with automated scaling, health checks, and multi‑AZ deployments.
Established cloud SDLC best practices by introducing infrastructure‑as‑code, standardized CI/CD pipelines, and guardrails for security, observability, and cost management across teams.

AWSKubernetesDockerServerlessCDKCloud Architecture

Code & Snippets

Executable examples

Algorithms, data structures, and system behaviors you can run right here in the browser.

Consistent Hashing Implementation

Distributed SystemsRust

A consistent hashing ring implementation for distributed data placement, used in the vector search platform.

consistent_hashing_implementation.rs

Loading editor...

Flink Citation Matching Pipeline

Data PipelinesScala

Scala implementation of a citation matching pipeline using Apache Flink for distributed processing.

flink_citation_matching_pipeline.scala

Loading editor...

Distributed Job Orchestration

Data PipelinesScala

Scala implementation of a Dagster-like pipeline orchestrator for data engineering workflows.

distributed_job_orchestration.scala

Loading editor...

LSM-Tree Implementation

Distributed SystemsRust

A Log-Structured Merge Tree implementation with in-memory B-tree and on-disk SSTable storage, used for efficient write-heavy workloads.

lsm-tree_implementation.rs

Loading editor...

Intent & Philosophy

How I approach engineering work

What I optimize for when I design systems, collaborate with teams, and make tradeoffs.

I design and build distributed systems, data platforms, and cloud infrastructure. My work focuses on reliability, observability, and making systems that scale without breaking.

System Design & Architecture

Good architecture starts with understanding constraints: technical, business, and organizational. Rather than chasing the latest trends, I focus on solutions that balance performance, maintainability, and team velocity. Incremental improvements over big-bang rewrites, always considering operational complexity.

When designing distributed systems, observability comes first. You can't fix what you can't see. I design for failure, plan for scale, and always consider the human operators who will maintain the system. Whether choosing between microservices and monoliths, or selecting a data processing framework, I evaluate tradeoffs explicitly and document the reasoning.

Reliability & Observability

Production systems fail. The question is how quickly you can detect, diagnose, and resolve issues. I've seen too many systems where debugging production problems means grepping through logs or guessing at root causes. That's why metrics, tracing, and structured logging are integrated from the start.

The focus is on reducing mean-time-to-discovery (MTTD) and mean-time-to-resolution (MTTR). This means thoughtful alerting (not alert fatigue), comprehensive metrics (not just request counts), and clear runbooks. I've improved incident response by adding observability tooling, and I've seen how good monitoring can turn a 4-hour debugging session into a 10-minute fix.

On-call experience has taught me that reliability isn't just about preventing failures, it's about making failures manageable. Systems need graceful degradation, circuit breakers, and clear failure modes. When something breaks at 2 AM, the system should give you enough information to understand what's wrong and how to fix it.

Mentoring, Interviewing, & Community Building

After conducting over 100 engineering interviews, I've learned that hiring is about finding people who can grow, not just people who know specific technologies. The focus is on system design thinking, problem-solving approaches, and cultural fit. Candidates get clear feedback, and the process is treated as a two-way conversation.

Mentoring is about creating space for others to learn and grow. I've worked with junior and mid-level developers on everything from debugging production issues to designing their first microservices. In practice, this means pairing, code reviews as teaching moments, and helping people understand the "why" behind decisions, not just the "what".

Communities of practice are powerful ways to share knowledge and standardize approaches. I started a machine learning community of practice at CAS to help teams learn from each other and establish best practices. These communities work best when they're bottom-up, focused on real problems, and have clear goals. I've seen how they can transform organizational culture and accelerate learning.

Education

M.S. Computer Science, Architecture (in progress)

Franklin University

B.S. Computer Science, Magna Cum Laude

Franklin University

Technologies

Languages

Scala, Rust, Java, Python, TypeScript/JavaScript

Platforms & Infrastructure

AWS, Docker, Kubernetes, Kafka, Spark, Flink

Observability

Prometheus, Grafana, DataDog, X-Ray

Contact

Let's talk

Whether you're hiring, looking for feedback, or just want to chat systems, feel free to reach out.

Get in Touch

If you’re working on large‑scale, mission‑critical platforms, happy to brainstorm architectures, tradeoffs, or incident learnings.

Distributed systems & data platforms
Cloud architecture & ML infrastructure
Reliability, observability, and incident response
Technical leadership and cross‑team architecture reviews

Best way to reach me is email; I usually reply within 1–2 business days.

Email me Connect on LinkedIn

Email: colin.mtech@gmail.com
LinkedIn: linkedin.com/in/colin-r-moran
Location: Marysville, OH - Remote

Resume

Download Resume (PDF)