Skip to main content
If you are familiar with cloud-based AI APIs (e.g. OpenAI API), this document shows the similarities and differences between cloud APIs and the LEAP SDK. We will inspect this Python-based OpenAI API chat completion request and show how to achieve the same with LeapSDK. This example is modified from OpenAI API documentation.
from openai import OpenAI
client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {
            "role": "user",
            "content": "Say 'double bubble bath' ten times fast.",
        },
    ],
    stream=True,
)

for chunk in stream:
    if chunk.choices:
        delta_content = chunk.choices[0].delta.get("content")
        if delta_content:
            print(delta_content, end="", flush=True)

print("")
print("Generation done!")

Loading the Model

While cloud APIs let you use models immediately after creating a client, LeapSDK requires you to explicitly load the model first — because the model runs locally. This step generally takes a few seconds depending on model size and device performance. On cloud API, you create an API client:
client = OpenAI()
In LeapSDK, you download and load the model to create a model runner:
// Using LeapModelDownloader (Android - recommended)
val downloader = LeapModelDownloader(context)
val modelRunner = downloader.loadModel(
    modelSlug = "LFM2.5-1.2B-Instruct",
    quantizationSlug = "Q4_K_M"
)

// OR using LeapDownloader (cross-platform)
val downloader = LeapDownloader()
val modelRunner = downloader.loadModel(
    modelSlug = "LFM2.5-1.2B-Instruct",
    quantizationSlug = "Q4_K_M"
)
The return value is a “model runner” which plays a similar role to the client object in the cloud API — except that it carries the model weights. If the model runner is released, the app has to reload the model before requesting new generations.

Requesting Generation

In the cloud API, client.chat.completions.create returns a stream object:
stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {
            "role": "user",
            "content": "Say 'double bubble bath' ten times fast.",
        },
    ],
    stream=True,
)
In LeapSDK, use generateResponse on the conversation object to get a stream for generation. Since the model runner already contains all model information, you don’t need to specify the model name again:
val conversation = modelRunner.createConversation()
val stream = conversation.generateResponse(
    ChatMessage(
        ChatMessage.Role.USER,
        listOf(ChatMessageContent.Text("Say 'double bubble bath' ten times fast."))
    )
)

// Simplified version with the same effect:
val stream = conversation.generateResponse("Say 'double bubble bath' ten times fast.")

Processing Generated Content

In cloud API Python code, a for-loop retrieves the content:
for chunk in stream:
    if chunk.choices:
        delta_content = chunk.choices[0].delta.get("content")
        if delta_content:
            print(delta_content, end="", flush=True)

print("")
print("Generation done!")
In LeapSDK, call onEach on the Kotlin Flow to process content. Call collect() to start generation:
stream.onEach { chunk ->
    when (chunk) {
        is MessageResponse.Chunk -> {
            print(chunk.text)
        }
        else -> {}
    }
}.onCompletion {
    print("")
    print("Generation done!")
}.collect()

Async Context

Most LeapSDK APIs are asynchronous. You need an async context to execute them:
LeapSDK Android APIs use Kotlin coroutines. Use viewModelScope in a ViewModel:
class ChatViewModel(application: Application) : AndroidViewModel(application) {
    private val downloader = LeapModelDownloader(application)
    private var modelRunner: ModelRunner? = null
    private var conversation: Conversation? = null

    fun loadModelAndGenerate() {
        viewModelScope.launch {
            modelRunner = downloader.loadModel(
                modelSlug = "LFM2.5-1.2B-Instruct",
                quantizationSlug = "Q4_K_M"
            )

            conversation = modelRunner?.createConversation()

            conversation?.generateResponse("Say 'double bubble bath' ten times fast.")
                ?.onEach { chunk ->
                    when (chunk) {
                        is MessageResponse.Chunk -> print(chunk.text)
                        else -> {}
                    }
                }?.onCompletion {
                    println("\nGeneration done!")
                }?.collect()
        }
    }

    override fun onCleared() {
        super.onCleared()
        runBlocking(Dispatchers.IO) {
            modelRunner?.unload()
        }
    }
}

Next Steps

For more information, see the Quick Start Guide.