Skip to main content

Core Architecture

LeapModelDownloader / LeapDownloader / Leap.load()
    ↓
ModelRunner
    ↓
Conversation
    ↓
MessageResponse (streaming)
The LEAP SDK uses Kotlin Multiplatform (KMP) to share core inference logic across Android, iOS, and macOS. Platform-specific wrappers (LeapModelDownloader on Android, Leap.load() on Apple) provide native ergonomics while the shared ModelRunner, Conversation, and MessageResponse layer remains consistent.

Installation

Gradle Dependencies

Recommended: Use a version catalog for dependency management.
# gradle/libs.versions.toml
[versions]
leapSdk = "0.10.0-SNAPSHOT"

[libraries]
leap-sdk = { module = "ai.liquid.leap:leap-sdk", version.ref = "leapSdk" }
leap-model-downloader = { module = "ai.liquid.leap:leap-model-downloader", version.ref = "leapSdk" }
// app/build.gradle.kts
dependencies {
    implementation(libs.leap.sdk)
    implementation(libs.leap.model.downloader)  // For Android notifications & background downloads
}
Alternative: Direct dependencies
// app/build.gradle.kts
dependencies {
    implementation("ai.liquid.leap:leap-sdk:0.10.0-SNAPSHOT")
    implementation("ai.liquid.leap:leap-model-downloader:0.10.0-SNAPSHOT")
}

Required Permissions

Add to AndroidManifest.xml:
<uses-permission android:name="android.permission.INTERNET"></uses-permission>
<uses-permission android:name="android.permission.POST_NOTIFICATIONS"></uses-permission>
<uses-permission android:name="android.permission.FOREGROUND_SERVICE"></uses-permission>
<uses-permission android:name="android.permission.FOREGROUND_SERVICE_DATA_SYNC"></uses-permission>

Runtime Permissions (Android 13+)

Request notification permission before downloading:
// In Activity
private val permissionLauncher = registerForActivityResult(
    ActivityResultContracts.RequestPermission()
) { isGranted ->
    if (isGranted) {
        // Permission granted, proceed with download
    } else {
        // Permission denied, handle gracefully
    }
}

// Before downloading
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.TIRAMISU) {
    if (ContextCompat.checkSelfPermission(this, POST_NOTIFICATIONS) != PERMISSION_GRANTED) {
        permissionLauncher.launch(android.Manifest.permission.POST_NOTIFICATIONS)
    }
}

Loading Models

The simplest approach — specify model name and quantization, SDK handles everything:
import ai.liquid.leap.downloader.LeapModelDownloader
import ai.liquid.leap.downloader.LeapModelDownloaderNotificationConfig

class ChatViewModel(application: Application) : AndroidViewModel(application) {
    private val downloader = LeapModelDownloader(
        application,
        notificationConfig = LeapModelDownloaderNotificationConfig.build {
            notificationTitleDownloading = "Downloading AI model..."
            notificationTitleDownloaded = "Model ready!"
        }
    )

    private var modelRunner: ModelRunner? = null

    fun loadModel() {
        viewModelScope.launch {
            try {
                // Downloads if not cached, then loads
                modelRunner = downloader.loadModel(
                    modelSlug = "LFM2.5-1.2B-Instruct",
                    quantizationSlug = "Q4_K_M",
                    progress = { progressData ->
                        // progressData.progress: Float (0.0 to 1.0)
                        Log.d(TAG, "Progress: ${(progressData.progress * 100).toInt()}%")
                    }
                )
            } catch (e: Exception) {
                Log.e(TAG, "Failed to load model", e)
            }
        }
    }

    override fun onCleared() {
        super.onCleared()

        // Unload model asynchronously to avoid ANR
        // Do NOT use runBlocking - it blocks the main thread and can cause ANRs
        CoroutineScope(Dispatchers.IO).launch {
            try {
                modelRunner?.unload()
            } catch (e: Exception) {
                Log.e(TAG, "Error unloading model", e)
            }
        }
    }
}
Available models and quantizations: LEAP Model Library

Method 2: Download Without Loading

Separate download from loading for better control:
import ai.liquid.leap.downloader.LeapModelDownloader

class ChatViewModel(application: Application) : AndroidViewModel(application) {
    private val downloader = LeapModelDownloader(application)
    private var modelRunner: ModelRunner? = null

    // Step 1: Download model to cache (doesn't load into memory)
    suspend fun downloadModel() {
        try {
            downloader.downloadModel(
                modelSlug = "LFM2.5-1.2B-Instruct",
                quantizationSlug = "Q4_K_M",
                progress = { progressData ->
                    Log.d(TAG, "Download: ${(progressData.progress * 100).toInt()}%")
                }
            )
            // Model is now cached locally
        } catch (e: Exception) {
            Log.e(TAG, "Download failed", e)
        }
    }

    // Step 2: Later, load from cache (no download)
    suspend fun loadCachedModel() {
        try {
            modelRunner = downloader.loadModel(
                modelSlug = "LFM2.5-1.2B-Instruct",
                quantizationSlug = "Q4_K_M"
            )
            // Loads immediately from cache, no network request
        } catch (e: Exception) {
            Log.e(TAG, "Load failed", e)
        }
    }

    override fun onCleared() {
        super.onCleared()
        CoroutineScope(Dispatchers.IO).launch {
            try {
                modelRunner?.unload()
            } catch (e: Exception) {
                Log.e(TAG, "Error unloading model", e)
            }
        }
    }
}

Method 3: Cross-Platform LeapDownloader (Kotlin Multiplatform)

For KMP projects targeting iOS, macOS, JVM, and Android:
import ai.liquid.leap.LeapDownloader
import ai.liquid.leap.LeapDownloaderConfig

val downloader = LeapDownloader(
    config = LeapDownloaderConfig(saveDir = "/path/to/models")
)

// Load model (downloads if not cached)
val modelRunner = downloader.loadModel(
    modelSlug = "LFM2.5-1.2B-Instruct",
    quantizationSlug = "Q4_K_M"
)
LeapDownloader does not provide Android-specific features like notifications or WorkManager integration. Use LeapModelDownloader for better UX on Android.

Method 4: Custom Manifest URL (Swift only)

Load from a custom manifest:
let manifestURL = URL(string: "https://your-server.com/model-manifest.json")!

let modelRunner = try await Leap.load(
    manifestURL: manifestURL,
    downloadProgressHandler: { progress, speed in
        print("Progress: \(Int(progress * 100))%")
    }
)

Method 5: Local Bundle (Swift only, Legacy)

Load from a local .bundle or .gguf file:
guard let bundleURL = Bundle.main.url(forResource: "model", withExtension: "bundle") else {
    fatalError("Model bundle not found")
}

let modelRunner = try await Leap.load(
    url: bundleURL,
    options: LiquidInferenceEngineOptions(
        bundlePath: bundleURL.path,
        cpuThreads: 6,
        contextSize: 8192,
        nGpuLayers: 8  // Metal GPU acceleration on macOS
    )
)

Core Classes

ModelRunner

The loaded model instance. Create conversations from this.
Methods:
  • createConversation(systemPrompt: String? = null): Conversation — Start new chat
  • createConversationFromHistory(history: List<ChatMessage>): Conversation — Restore chat
  • suspend fun unload() — Free memory (MUST call in onCleared)
val conversation = modelRunner.createConversation(
    systemPrompt = "Explain it to me like I'm 5 years old"
)

// Or restore from saved history
val conversation = modelRunner.createConversationFromHistory(savedHistory)

Conversation

Manages chat history and generation.
Fields:
  • history: List<ChatMessage> — Full message history (returns a copy, immutable)
  • isGenerating: Boolean — Thread-safe generation status
Methods:
  • generateResponse(userTextMessage: String, options: GenerationOptions? = null): Flow<MessageResponse>
  • generateResponse(message: ChatMessage, options: GenerationOptions? = null): Flow<MessageResponse>
  • registerFunction(function: LeapFunction) — Add tool for function calling
  • appendToHistory(message: ChatMessage) — Add message without generating

ChatMessage

Represents a single message in the conversation.
data class ChatMessage(
    val role: Role,              // USER, ASSISTANT, SYSTEM, TOOL
    val content: List<ChatMessageContent>,
    val reasoningContent: String? = null,  // From reasoning models
    val functionCalls: List<LeapFunctionCall>? = null
)

enum class Role { USER, ASSISTANT, SYSTEM, TOOL }

ChatMessageContent

Content types supported in messages.
ChatMessageContent.Text(text: String)
ChatMessageContent.Image(jpegByteArray: ByteArray)  // JPEG only
ChatMessageContent.Audio(wavByteArray: ByteArray)   // WAV only
Audio Requirements (both platforms):
  • Format: WAV (RIFF) only — no MP3/AAC/OGG
  • Sample Rate: 16 kHz (mono channel required)
  • Encoding: PCM (Float32, Int16, Int24, or Int32)
  • Channels: Mono (1 channel) — stereo will be rejected

MessageResponse

Streaming response types from generation.
MessageResponse.Chunk(text: String)                    // Text token
MessageResponse.ReasoningChunk(reasoning: String)      // Thinking (LFM2.5-1.2B-Thinking)
MessageResponse.FunctionCalls(functionCalls: List)     // Tool calls requested
MessageResponse.AudioSample(samples: FloatArray, sampleRate: Int)  // Audio output (24kHz)
MessageResponse.Complete(
    fullMessage: ChatMessage,
    finishReason: GenerationFinishReason,  // STOP or EXCEED_CONTEXT
    stats: GenerationStats?                // Token counts, tokens/sec
)

GenerationOptions

Control generation behavior.
val options = GenerationOptions(
    temperature = 0.7f,              // Randomness (0.0 = deterministic, 1.0+ = creative)
    topP = 0.9f,                     // Nucleus sampling
    minP = 0.05f,                    // Minimum probability
    repetitionPenalty = 1.1f,        // Prevent repetition
    jsonSchemaConstraint = """{"type":"object",...}""",  // Force JSON output
    functionCallParser = LFMFunctionCallParser(),  // Enable function calling (null to disable)
    inlineThinkingTags = false       // Emit ReasoningChunk separately (for thinking models)
)

conversation.generateResponse(userInput, options).collect { ... }

Generation Patterns

Basic Text Generation

class ChatViewModel : ViewModel() {
    private var generationJob: Job? = null
    private val _responseText = MutableStateFlow("")

    fun generate(userInput: String) {
        generationJob?.cancel()  // Cancel previous generation

        generationJob = viewModelScope.launch {
            conversation?.generateResponse(userInput)
                ?.onEach { response ->
                    when (response) {
                        is MessageResponse.Chunk -> {
                            _responseText.value += response.text
                        }
                        is MessageResponse.Complete -> {
                            Log.d(TAG, "Tokens/sec: ${response.stats?.tokenPerSecond}")
                        }
                        else -> {}
                    }
                }
                ?.catch { e ->
                    // Handle error
                }
                ?.collect()
        }
    }

    fun stopGeneration() {
        generationJob?.cancel()
    }
}

Multimodal Input (Vision)

val imageBytes = File("image.jpg").readBytes()  // JPEG only

val message = ChatMessage(
    role = ChatMessage.Role.USER,
    content = listOf(
        ChatMessageContent.Image(imageBytes),
        ChatMessageContent.Text("What's in this image?")
    )
)

conversation.generateResponse(message).collect { ... }

Audio Input

import ai.liquid.leap.audio.FloatAudioBuffer

// From raw PCM samples
val audioBuffer = FloatAudioBuffer(sampleRate = 16000)
audioBuffer.add(floatArrayOf(...))  // Float samples normalized -1.0 to 1.0
val wavBytes = audioBuffer.createWavBytes()

val message = ChatMessage(
    role = ChatMessage.Role.USER,
    content = listOf(
        ChatMessageContent.Audio(wavBytes),
        ChatMessageContent.Text("Transcribe this audio")
    )
)

conversation.generateResponse(message).collect { ... }

Audio Output (Text-to-Speech)

val audioSamples = mutableListOf<FloatArray>()

conversation.generateResponse("Say hello").collect { response ->
    when (response) {
        is MessageResponse.AudioSample -> {
            // samples: FloatArray (Float32 PCM, -1.0 to 1.0)
            // sampleRate: Int (typically 24000 Hz)
            audioSamples.add(response.samples)
            playAudio(response.samples, response.sampleRate)
        }
    }
}

Function Calling

Register functions for the model to invoke. See also the Function Calling guide.
// 1. Define function
val getWeather = LeapFunction(
    name = "get_weather",
    description = "Get current weather for a city",
    parameters = """
        {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
                "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        }
    """
)

// 2. Register function
conversation.registerFunction(getWeather)

// 3. Handle function calls
conversation.generateResponse("What's the weather in Tokyo?").collect { response ->
    when (response) {
        is MessageResponse.FunctionCalls -> {
            response.functionCalls.forEach { call ->
                // call.name: String
                // call.arguments: String (JSON)
                val result = executeTool(call.name, call.arguments)

                // Add result back to conversation
                val toolMessage = ChatMessage(
                    role = ChatMessage.Role.TOOL,
                    content = listOf(ChatMessageContent.Text(result))
                )
                conversation.appendToHistory(toolMessage)

                // Generate next response
                conversation.generateResponse("").collect { ... }
            }
        }
    }
}

Structured Output (Constrained Generation)

Use the @Generatable annotation/macro for type-safe JSON output. See also the Constrained Generation guide.
@Serializable
@Generatable("Recipe information")
data class Recipe(
    val name: String,
    val ingredients: List<String>,
    val steps: List<String>
)

val options = GenerationOptions().apply {
    setResponseFormatType<Recipe>()  // Auto-generates JSON schema
}

conversation.generateResponse("Generate a pasta recipe", options).collect { response ->
    if (response is MessageResponse.Complete) {
        val recipe = LeapJson.decodeFromString<Recipe>(response.fullMessage.content[0].text)
    }
}

Conversation Persistence

// Save conversation
val json = LeapJson.encodeToString(conversation.history)

// Restore conversation
val history = LeapJson.decodeFromString<List<ChatMessage>>(json)
val conversation = modelRunner.createConversationFromHistory(history)

Model Download Management

Query download status and manage cached models.
import ai.liquid.leap.downloader.LeapModelDownloader

val downloader = LeapModelDownloader(application)

// Query status for a specific model
viewModelScope.launch {
    val status = downloader.queryStatus(
        modelSlug = "LFM2.5-1.2B-Instruct",
        quantizationSlug = "Q4_K_M"
    )

    when (status) {
        is ModelDownloadStatus.NotOnLocal -> {
            Log.d(TAG, "Model not downloaded")
        }
        is ModelDownloadStatus.DownloadInProgress -> {
            val progressPercent = (status.progress * 100).toInt()
            Log.d(TAG, "Downloading: $progressPercent%")
        }
        is ModelDownloadStatus.Downloaded -> {
            Log.d(TAG, "Model ready to load")
        }
    }
}

// Get total model size before downloading
val totalBytes = downloader.getModelSize(
    modelSlug = "LFM2.5-1.2B-Instruct",
    quantizationSlug = "Q4_K_M"
)
val totalMB = totalBytes / (1024 * 1024)

// Remove a specific model from cache
downloader.removeModel(
    modelSlug = "LFM2.5-1.2B-Instruct",
    quantizationSlug = "Q4_K_M"
)

// Cancel an in-progress download
downloader.cancelDownload(
    modelSlug = "LFM2.5-1.2B-Instruct",
    quantizationSlug = "Q4_K_M"
)
Download Status Types:
sealed interface ModelDownloadStatus {
    object NotOnLocal : ModelDownloadStatus
    data class DownloadInProgress(val progress: Float) : ModelDownloadStatus  // 0.0 to 1.0
    object Downloaded : ModelDownloadStatus
}

Complete ViewModel Example

import ai.liquid.leap.*
import ai.liquid.leap.downloader.*
import ai.liquid.leap.message.*
import android.app.Application
import androidx.lifecycle.AndroidViewModel
import androidx.lifecycle.viewModelScope
import kotlinx.coroutines.*
import kotlinx.coroutines.flow.*

class ChatViewModel(application: Application) : AndroidViewModel(application) {
    private val downloader = LeapModelDownloader(
        application,
        notificationConfig = LeapModelDownloaderNotificationConfig.build {
            notificationTitleDownloading = "Downloading model..."
            notificationTitleDownloaded = "Model ready!"
        }
    )

    private var modelRunner: ModelRunner? = null
    private var conversation: Conversation? = null
    private var generationJob: Job? = null

    private val _messages = MutableStateFlow<List<ChatMessage>>(emptyList())
    val messages: StateFlow<List<ChatMessage>> = _messages.asStateFlow()

    private val _isLoading = MutableStateFlow(false)
    val isLoading: StateFlow<Boolean> = _isLoading.asStateFlow()

    private val _isGenerating = MutableStateFlow(false)
    val isGenerating: StateFlow<Boolean> = _isGenerating.asStateFlow()

    private val _currentResponse = MutableStateFlow("")
    val currentResponse: StateFlow<String> = _currentResponse.asStateFlow()

    fun loadModel() {
        viewModelScope.launch {
            _isLoading.value = true
            try {
                modelRunner = downloader.loadModel(
                    modelSlug = "LFM2.5-1.2B-Instruct",
                    quantizationSlug = "Q4_K_M"
                )
                conversation = modelRunner?.createConversation(
                    systemPrompt = "Explain it to me like I'm 5 years old"
                )
            } catch (e: Exception) {
                // Handle error
            } finally {
                _isLoading.value = false
            }
        }
    }

    fun sendMessage(text: String) {
        generationJob?.cancel()
        _currentResponse.value = ""

        generationJob = viewModelScope.launch {
            _isGenerating.value = true
            try {
                conversation?.generateResponse(text)
                    ?.onEach { response ->
                        when (response) {
                            is MessageResponse.Chunk -> {
                                _currentResponse.value += response.text
                            }
                            is MessageResponse.Complete -> {
                                _messages.value = conversation?.history ?: emptyList()
                                _currentResponse.value = ""
                            }
                            else -> {}
                        }
                    }
                    ?.catch { e ->
                        // Handle generation error
                    }
                    ?.collect()
            } finally {
                _isGenerating.value = false
            }
        }
    }

    fun stopGeneration() {
        generationJob?.cancel()
        _isGenerating.value = false
    }

    override fun onCleared() {
        super.onCleared()
        generationJob?.cancel()
        CoroutineScope(Dispatchers.IO).launch {
            try {
                modelRunner?.unload()
            } catch (e: Exception) {
                Log.e(TAG, "Error unloading model", e)
            }
        }
    }
}

Error Handling

sealed class LeapException : Exception()
class LeapModelLoadingException : LeapException()
class LeapGenerationException : LeapException()
class LeapGenerationPromptExceedContextLengthException : LeapException()
class LeapSerializationException : LeapException()

try {
    modelRunner = downloader.loadModel(...)
} catch (e: LeapModelLoadingException) {
    // Model failed to load
} catch (e: LeapGenerationPromptExceedContextLengthException) {
    // Prompt too long
} catch (e: Exception) {
    // Other errors
}

Imports Reference

Android (LeapModelDownloader):
import ai.liquid.leap.Conversation
import ai.liquid.leap.ModelRunner
import ai.liquid.leap.downloader.LeapModelDownloader
import ai.liquid.leap.downloader.LeapModelDownloaderNotificationConfig
import ai.liquid.leap.message.ChatMessage
import ai.liquid.leap.message.ChatMessageContent
import ai.liquid.leap.message.MessageResponse
import ai.liquid.leap.generation.GenerationOptions
import ai.liquid.leap.LeapException
Cross-Platform (LeapDownloader):
import ai.liquid.leap.Conversation
import ai.liquid.leap.ModelRunner
import ai.liquid.leap.LeapDownloader
import ai.liquid.leap.LeapDownloaderConfig
import ai.liquid.leap.message.ChatMessage
import ai.liquid.leap.message.ChatMessageContent
import ai.liquid.leap.message.MessageResponse
import ai.liquid.leap.generation.GenerationOptions

Model Selection Guide

Text Models

  • LFM2.5-1.2B-Instruct: General purpose (recommended)
  • LFM2.5-1.2B-Thinking: Extended reasoning (emits ReasoningChunk)
  • LFM2-1.2B: Stable version
  • LFM2-1.2B-Tool: Optimized for function calling

Multimodal Models

  • LFM2.5-VL-1.6B: Vision + text
  • LFM2.5-Audio-1.5B: Audio + text (TTS, ASR, voice chat)

Quantization Guide

Choose the right balance of speed vs quality:
QuantizationQualitySizeSpeedUse Case
Q4_0LowestSmallestFastestPrototyping, low-end devices
Q4_K_MGoodSmallFastRecommended for most apps
Q5_K_MBetterMediumMediumQuality-sensitive applications
Q6_KHighLargeSlowerHigh-quality responses needed
Q8_0Near-originalLargerSlowMaximum quality
F16OriginalLargestSlowestResearch, benchmarking

Critical Best Practices

1. Model Unloading (REQUIRED)

Always release model resources when you are done. On Android, unload asynchronously to avoid ANR (Application Not Responding) errors. On iOS, nil out the references.
override fun onCleared() {
    super.onCleared()

    // Unload model asynchronously to avoid ANR
    // NEVER use runBlocking - it blocks the main thread and causes ANRs
    CoroutineScope(Dispatchers.IO).launch {
        try {
            modelRunner?.unload()
        } catch (e: Exception) {
            Log.e(TAG, "Error unloading model", e)
        }
    }
}

2. Generation Cancellation

// Generation auto-cancels when Flow collection is cancelled
generationJob?.cancel()

// Or when viewModelScope is cleared (ViewModel destroyed)

3. Thread Safety

  • All SDK operations are main-thread safe on both platforms
  • Kotlin: Use viewModelScope.launch for all suspend functions
  • Swift: Use @MainActor for UI-bound ViewModels and Task {} for async work
  • Callbacks run on the main thread

4. History Management

Both platforms return a copy of the history that is safe to read without synchronization:
// conversation.history returns a COPY
val history = conversation.history  // Safe to read

// To restore conversation
val newConversation = modelRunner.createConversationFromHistory(savedHistory)

5. Serialization

// Save conversation
val json = LeapJson.encodeToString(conversation.history)

// Restore conversation
val history = LeapJson.decodeFromString<List<ChatMessage>>(json)
val conversation = modelRunner.createConversationFromHistory(history)

Troubleshooting

Model Fails to Load

  • Check internet connection (first download requires network)
  • Android: Verify minSdk = 31 in build.gradle.kts; use physical device (emulators may crash)
  • iOS/macOS: Test on physical device (simulator is much slower)
  • Check storage space — models typically need 500MB to 2GB

Generation is Slow

  • Test on a physical device (simulators and emulators are much slower)
  • Use smaller quantization (Q4_K_M instead of Q8_0)
  • Reduce context size in options
  • macOS: Increase nGpuLayers for Metal GPU acceleration

Audio Not Working

  • Verify WAV format (16kHz, mono, PCM) — no MP3/AAC/OGG
  • Check that the model supports audio (LFM2.5-Audio models)
  • Ensure mono channel — stereo will be rejected
  • Audio output is typically 24kHz (different from 16kHz input)

Memory Issues

  • Always unload the model when done (see Critical Best Practices above)
  • Do not load multiple models simultaneously
  • Use appropriate quantization (Q4_K_M recommended)
  • Use smaller models on devices with limited RAM (e.g., LFM2-350M for 3GB devices, LFM2.5-1.2B for 6GB+ devices)

Generation Fails

  • Check prompt length vs context window
  • Verify the model supports the feature you are using (vision, audio, function calling)
  • Check isGenerating before starting a new generation

Platform Requirements

RequirementAndroidiOSmacOS
Minimum OSAPI 31 (Android 12)14.0+11.0+
Build toolsGradle + AGPXcode 15+ / Swift 5.9+Xcode 15+ / Swift 5.9+
DistributionMaven (Gradle)SPMSPM
Device RAM3GB min (6GB+ recommended)3GB min (6GB+ recommended)6GB+ recommended
Storage500MB - 2GB per model500MB - 2GB per model500MB - 2GB per model