Messages & Content

Chat Messages

Roles

Kotlin
Swift

Roles of the chat messages, which follows the OpenAI API definition:

enum class Role(val type: String) {
  SYSTEM("system"),
  USER("user"),
  ASSISTANT("assistant"),
  TOOL("tool"),
}

SYSTEM: Indicates the associated content is part of the system prompt. It is generally the first message, to provide guidance on how the model should behave.
USER: Indicates the associated content is user input.
ASSISTANT: Indicates the associated content is model-generated output.
TOOL: Used when appending function-call results back into the conversation.

public enum ChatMessageRole: String {
  case user
  case system
  case assistant
  case tool
}

Include .tool messages when you append function-call results back into the conversation.

Message Structure

Kotlin
Swift

data class ChatMessage(
  val role: Role,
  val content: List<ChatMessageContent>
  val reasoningContent: String? = null
  val functionCalls: List<LeapFunctionCall>? = null,
) {
  fun toJSONObject(): JSONObject
}

ChatMessage.fromJSONObject(obj: JSONObject): ChatMessage

Fields

role: The role of this message (see ChatMessage.Role).
content: A list of message contents. Each element is an instance of ChatMessageContent.
reasoningContent: The reasoning content generated by the reasoning models. Only messages generated by reasoning models will have this field. For other models or other roles, this field should be null.
functionCalls: Function call requests generated by the model. See the Function Calling guide for more details.

`toJSONObject`

Returns a JSONObject that represents the chat message. The returned object is compatible with ChatCompletionRequestMessage from the OpenAI API. It contains 2 fields: role and content.

`fromJSONObject`

Constructs a ChatMessage instance from a JSONObject. Not all JSON object variants in ChatCompletionRequestMessage of the OpenAI API are acceptable. As of now, role supports user, system and assistant; content can be a string or an array.

LeapSerializationException will be thrown if the provided JSONObject cannot be recognized as a message.

public struct ChatMessage {
  public var role: ChatMessageRole
  public var content: [ChatMessageContent]
  public var reasoningContent: String?
  public var functionCalls: [LeapFunctionCall]?

  public init(
    role: ChatMessageRole,
    content: [ChatMessageContent],
    reasoningContent: String? = nil,
    functionCalls: [LeapFunctionCall]? = nil
  )

  public init(from json: [String: Any]) throws
}

Fields

content: Ordered fragments of the message. The SDK supports .text, .image, and .audio parts.
reasoningContent: Optional text produced inside <think> tags by eligible models.
functionCalls: Attach the calls returned by MessageResponse.functionCall when you include tool execution results in the history.

Message Content

Kotlin
Swift

Data class that is compatible with the content object in the OpenAI chat completion API. It is a sealed interface.

abstract interface ChatMessageContent {
  fun clone(): ChatMessageContent
  fun toJSONObject(): JSONObject
}
fun ChatMessageContent.fromJSONObject(obj: JSONObject): ChatMessageContent

data class ChatMessageContent.Text(val text: String): ChatMessageContent
data class ChatMessageContent.Image(val jpegByteArray: ByteArray): ChatMessageContent
data class ChatMessageContent.Audio(val wavByteArray: ByteArray): ChatMessageContent

toJSONObject returns an OpenAI API compatible content object (with a type field and the real content fields).
fromJSONObject receives an OpenAI API compatible content object to build a message content. Not all OpenAI content objects are accepted.

Currently, the following content types are supported:

Text: Pure text content.
Image: JPEG-encoded image content.
Audio: WAV-encoded audio content.

LeapSerializationException will be thrown if the provided JSONObject cannot be recognized as a message.

`ChatMessageContent.Text`

data class ChatMessageContent.Text(val text: String): ChatMessageContent

Pure text content. The content is available in the text field.

`ChatMessageContent.Image`

data class ChatMessageContent.Image(val jpegByteArray: ByteArray): ChatMessageContent {
  companion object {
    suspend fun fromBitmap(
      bitmap: android.graphics.Bitmap,
      compressionQuality: Int = 85,
    ): ChatMessageContent.Image
  }
}

Image content. Only JPEG-encoded data is supported. The fromBitmap helper function creates a ChatMessageContent.Image from an Android Bitmap object (the image will be compressed).

Only the models with vision capabilities can process image content. Sending image content to other models may result in unexpected outputs or errors.

`ChatMessageContent.Audio`

data class Audio(val wavByteArray: ByteArray) : ChatMessageContent {
  constructor(data: ByteArray) : this(inputAudio = InputAudio(data = data))
}

Audio content for speech recognition and audio understanding. The inference engine requires WAV-encoded audio with specific format requirements (see Audio Format Requirements below).

public enum ChatMessageContent {
  case text(String)
  case image(Data)   // JPEG bytes
  case audio(Data)   // WAV bytes

  public init(from json: [String: Any]) throws
}

Provide JPEG-encoded bytes for .image and WAV data for .audio. Helper initializers such as ChatMessageContent.fromUIImage, ChatMessageContent.fromNSImage, ChatMessageContent.fromWAVData, and ChatMessageContent.fromFloatSamples(_:sampleRate:channelCount:) simplify interop with platform-native buffers. On the wire, image parts are encoded as OpenAI-style image_url payloads and audio parts as input_audio arrays with Base64 data.

Audio

Audio Format Requirements

The LEAP inference engine requires WAV-encoded audio with specific format requirements:

Property	Required Value	Notes
Format	WAV (RIFF)	Only WAV format is supported
Sample Rate	16000 Hz (16 kHz) recommended	Other sample rates are automatically resampled to 16 kHz
Encoding	PCM (various bit depths)	Supports Float32, Int16, Int24, Int32
Channels	Mono (1 channel)	Required - stereo audio will be rejected
Byte Order	Little-endian	Standard WAV format

Supported PCM Encodings:

Float32: 32-bit floating point, normalized to [-1.0, 1.0]
Int16: 16-bit signed integer, range [-32768, 32767] (recommended)
Int24: 24-bit signed integer, range [-8388608, 8388607]
Int32: 32-bit signed integer, range [-2147483648, 2147483647]

The inference engine only accepts WAV format. M4A, MP3, AAC, OGG, or other compressed formats are not supported and will cause errors. Audio must be converted to WAV before sending to the model.

Automatic Resampling: The inference engine automatically resamples audio to 16 kHz if provided at a different sample rate. However, for best performance and quality, provide audio at 16 kHz to avoid resampling overhead.

Mono Channel Required: The inference engine strictly requires single-channel (mono) audio. Multi-channel or stereo WAV files will be rejected with an error. Convert stereo audio to mono before sending.

Creating Audio Content from WAV Files

Kotlin
Swift

val audioFile = File("/path/to/audio.wav")
val wavBytes = audioFile.readBytes()
val audioContent = ChatMessageContent.Audio(wavBytes)

val message = ChatMessage(
    role = ChatMessage.Role.USER,
    content = listOf(
        ChatMessageContent.Text("What is being said in this audio?"),
        audioContent
    )
)

import LeapSDK

// Load WAV file
let wavURL = Bundle.main.url(forResource: "audio", withExtension: "wav")!
let wavData = try Data(contentsOf: wavURL)

let message = ChatMessage(
    role: .user,
    content: [
        .text("What is being said in this audio?"),
        .audio(wavData)
    ]
)

Creating Audio Content from Raw PCM Samples

Kotlin
Swift

If you’re recording audio or have raw PCM data, use the FloatAudioBuffer utility to create properly formatted WAV files:

import ai.liquid.leap.audio.FloatAudioBuffer

// Collect audio samples (32-bit float PCM, normalized to -1.0 to 1.0)
val audioBuffer = FloatAudioBuffer(sampleRate = 16000)

// Add audio chunks as they arrive
audioBuffer.add(floatArrayOf(0.1f, 0.2f, 0.15f, ...))
audioBuffer.add(floatArrayOf(0.3f, 0.25f, ...))

// Create WAV-encoded bytes
val wavBytes = audioBuffer.createWavBytes()
val audioContent = ChatMessageContent.Audio(wavBytes)

FloatAudioBuffer automatically creates a valid WAV header and encodes the samples as 32-bit float PCM in a WAV container, which is compatible with the inference engine.

Use the fromFloatSamples helper to create WAV-encoded data from raw audio samples:

import AVFoundation

// Float samples normalized to -1.0 to 1.0
let samples: [Float] = [0.1, 0.2, 0.15, -0.3, ...]

// Create WAV-encoded Data
let audioContent = ChatMessageContent.fromFloatSamples(
    samples,
    sampleRate: 16000,
    channelCount: 1
)

let message = ChatMessage(
    role: .user,
    content: [
        .text("Transcribe this audio"),
        audioContent
    ]
)

Recording Audio

Kotlin
Swift

When recording audio from the device microphone, configure AudioRecord or use a library like WaveRecorder with the correct settings:

import com.github.squti.androidwaverecorder.WaveRecorder

val waveRecorder = WaveRecorder(outputFilePath)
waveRecorder.configureWaveSettings {
    sampleRate = 16000                                      // 16 kHz
    channels = android.media.AudioFormat.CHANNEL_IN_MONO    // Mono
    audioEncoding = android.media.AudioFormat.ENCODING_PCM_16BIT  // 16-bit PCM
}

waveRecorder.startRecording()
// ... wait for user to finish speaking ...
waveRecorder.stopRecording()

// Read the WAV file
val audioFile = File(outputFilePath)
val wavBytes = audioFile.readBytes()
val audioContent = ChatMessageContent.Audio(wavBytes)

When recording audio from the device microphone, configure AVAudioRecorder with the correct settings:

import AVFoundation

let audioURL = FileManager.default.temporaryDirectory
    .appendingPathComponent("recording.wav")

let settings: [String: Any] = [
    AVFormatIDKey: kAudioFormatLinearPCM,           // Linear PCM
    AVSampleRateKey: 16000.0,                       // 16 kHz
    AVNumberOfChannelsKey: 1,                       // Mono
    AVLinearPCMBitDepthKey: 16,                     // 16-bit
    AVLinearPCMIsFloatKey: false,                   // Integer samples
    AVLinearPCMIsBigEndianKey: false                // Little-endian
]

let audioRecorder = try AVAudioRecorder(url: audioURL, settings: settings)
audioRecorder.record()

// ... wait for user to finish speaking ...

audioRecorder.stop()

// Read the WAV file
let wavData = try Data(contentsOf: audioURL)
let audioContent: ChatMessageContent = .audio(wavData)

Audio Duration Considerations

Minimum duration: At least 1 second of audio is recommended for reliable speech recognition
Maximum duration: Limited by the model’s context window (typically several minutes)
Silence: Trim excessive silence from the beginning and end for better results

Audio Output from Models

When generating audio responses (e.g., with LFM2.5-Audio-1.5B), the model outputs audio at 24 kHz sample rate:

Kotlin
Swift

conversation.generateResponse(userMessage)
    .onEach { response ->
        when (response) {
            is MessageResponse.AudioSample -> {
                // samples: FloatArray (32-bit float PCM)
                // sampleRate: Int (typically 24000 Hz for audio generation models)
                val samples = response.samples
                val sampleRate = response.sampleRate

                // Accumulate or play audio samples
                audioBuffer.add(samples)
            }
        }
    }
    .collect()

for try await response in conversation.generateResponse(message: userMessage) {
    switch response {
    case .audioSample(let samples, let sampleRate):
        // samples: [Float] (32-bit float PCM, normalized -1.0 to 1.0)
        // sampleRate: Int (typically 24000 Hz for audio generation models)

        // Accumulate samples or play immediately
        audioPlayer.enqueue(samples: samples, sampleRate: sampleRate)

    default:
        break
    }
}

Note: Audio input should be 16 kHz, but audio output from generation models is typically 24 kHz. Make sure your audio playback code supports the correct sample rate.

Getting Started

On-Device

GPU Inference

Tools

Chat Messages

Roles

Message Structure

Fields

`toJSONObject`

`fromJSONObject`

Fields

Message Content

`ChatMessageContent.Text`

`ChatMessageContent.Image`

`ChatMessageContent.Audio`

Audio

Audio Format Requirements

Creating Audio Content from WAV Files

Creating Audio Content from Raw PCM Samples

Recording Audio

Audio Duration Considerations

Audio Output from Models

Getting Started

On-Device

GPU Inference

Tools

​Chat Messages

​Roles

​Message Structure

​Fields

​toJSONObject

​fromJSONObject

​Fields

​Message Content

​ChatMessageContent.Text

​ChatMessageContent.Image

​ChatMessageContent.Audio

​Audio

​Audio Format Requirements

​Creating Audio Content from WAV Files

​Creating Audio Content from Raw PCM Samples

​Recording Audio

​Audio Duration Considerations

​Audio Output from Models

Chat Messages

Roles

Message Structure

Fields

`toJSONObject`

`fromJSONObject`

Fields

Message Content

`ChatMessageContent.Text`

`ChatMessageContent.Image`

`ChatMessageContent.Audio`

Audio

Audio Format Requirements

Creating Audio Content from WAV Files

Creating Audio Content from Raw PCM Samples

Recording Audio

Audio Duration Considerations

Audio Output from Models