model status: deployed in production

A language model
trained on real life.

Prompt Engineering Linguistics PhD Synthetic Data Applied ML EdTech AI Creative AI
explore training pipeline
Training Pipeline
Four phases. Each building richer representations of language — and what to do with it.
01 pre-training
🇹🇷

Turkey → Linguistics

Training data: Turkish · English · language fundamentals

Initial weights loaded in Turkey. Early epochs spent absorbing the deep structure of an agglutinative language — learning that word order isn't just syntax, it's information architecture.

100%
02 fine-tuning
🎓

PhD · UCLA Linguistics

Epochs: 2014–2020 · Loss fn: dissertation defense

Specialized training on Turkish syntax and Information Structure. Dissertation explored how speakers encode meaning through word position. Supervised by Profs. Mahajan & Sportiche. Taught LING 20.

100%
03 alignment
📚

AI Labs · ETS

Princeton, NJ · ~3 years · #AIforgood

Calibrated for educational impact. Part of AI Labs since day one — building data pipelines, synthetic data, and ML for assessment at scale. The model learned alignment: making capabilities serve human goals.

100%
04 deployment
🚀

Applied AI · Spotter

Since 2024 · Inference mode: creative generation

Deployed in production. Integrating generative AI into Spotter Studio for YouTubers — video ideas, thumbnails, content planning. Collaborating across Front-End, R&D, and Analytics. Live at scale.

82%
Model Architecture
Technical specifications of the SÖG-1 model.
Base Model
Linguistics
Syntax, semantics, pragmatics, information structure
Attention Heads
3 Languages
Turkish · English · the grammar of problem-solving
Context Window
From agglutinative morphology to YouTube thumbnails
core_layers
Prompt Engineering Synthetic Data Applied ML Data Engineering Computational Linguistics
tool_layers
Python LLM APIs Data Pipelines Generative AI EdTech Creative Tooling
Benchmark Results
Performance metrics across key evaluation dimensions.
PhD
UCLA Linguistics
~3yr
AI Labs · ETS
0
M+ Creators Served
0
Industries Changed
Research Papers
Published outputs from the training process.
Dissertation · 2020

Syntax and Information Structure in Turkish Word Order Variations

PhD Dissertation · UCLA Linguistics

A comprehensive investigation into how Turkish speakers encode information through syntactic position — revealing the deep connection between word order flexibility and pragmatic meaning.

Publication · 2013

The Syntax of Multiple Internal Arguments in Turkish

Academic Publication

Early research exploring argument structure in Turkish — foundational understanding of how agglutinative languages handle complex predicate-argument relationships.

Conference · IEEE

Building Scalable AI Solutions

IEEE Conference Speaker

From theoretical linguistics to production AI — sharing lessons on building scalable solutions that bridge research and real-world deployment.

Teaching · UCLA

LING 20 — Introduction to Linguistics

UCLA · Instructor

Knowledge distillation from a specialized model to hundreds of student nodes. Teaching the next generation how language works — from phonetics to syntax.

"The best language models understand context, nuance, and when to generate something unexpected. The same is true for careers."
— the SÖG-1 training log, probably
🎗️
Cycle for Survival
West LA rider for rare cancer research at Memorial Sloan Kettering. Even deployed models give back to the training community.
🎤
IEEE Speaker
Open-sourcing production insights back to the research community. Building bridges between academia and industry.
🌍
Cross-Domain Transfer
Turkey → UCLA → Princeton → LA. Each migration expanded world knowledge and ability to generalize across domains.