视觉

我们最新的 Mistral Small 模型和 Pixtral 模型具备视觉能力，使它们能够分析图像并基于视觉内容和文本提供见解。这种多模态方法为需要同时理解文本和视觉的应用开启了新的可能性。

具备视觉能力的模型：

Pixtral 12B (pixtral-12b-latest)
Pixtral Large 2411 (pixtral-large-latest)
Mistral Medium 2505 (mistral-medium-latest)
Mistral Small 2503 (mistral-small-latest)

传递图像 URL

如果图像托管在网上，您只需在请求中提供图像的 URL 即可。此方法直接简单，无需任何编码。

python
typescript
curl

import os
from mistralai import Mistral

# Retrieve the API key from environment variables
api_key = os.environ["MISTRAL_API_KEY"]

# Specify model
model = "pixtral-12b-2409"

# Initialize the Mistral client
client = Mistral(api_key=api_key)

# Define the messages for the chat
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What's in this image?"
            },
            {
                "type": "image_url",
                "image_url": "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg"
            }
        ]
    }
]

# Get the chat response
chat_response = client.chat.complete(
    model=model,
    messages=messages
)

# Print the content of the response
print(chat_response.choices[0].message.content)

import { Mistral } from "@mistralai/mistralai";

const apiKey = process.env["MISTRAL_API_KEY"];

const client = new Mistral({ apiKey: apiKey });

const chatResponse = await client.chat.complete({
  model: "pixtral-12b",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          imageUrl: "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg",
        },
      ],
    },
  ],
});

console.log("JSON:", chatResponse.choices[0].message.content);

curl https://api.mistral.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -d '{
    "model": "pixtral-12b-2409",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What’s in this image?"
          },
          {
            "type": "image_url",
            "image_url": "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg"
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

传递 Base64 编码图像

如果您有本地存储的图像或一组图像，可以将其以 Base64 编码格式传递给模型。Base64 编码是一种常用的方法，用于将二进制数据转换为文本格式，以便在互联网上传输。这在需要将图像包含在 API 请求中时特别有用。

python
typescript
curl

import base64
import requests
import os
from mistralai import Mistral

def encode_image(image_path):
    """Encode the image to base64."""
    try:
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')
    except FileNotFoundError:
        print(f"Error: The file {image_path} was not found.")
        return None
    except Exception as e:  # Added general exception handling
        print(f"Error: {e}")
        return None

# Path to your image
image_path = "path_to_your_image.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)

# Retrieve the API key from environment variables
api_key = os.environ["MISTRAL_API_KEY"]

# Specify model
model = "pixtral-12b-2409"

# Initialize the Mistral client
client = Mistral(api_key=api_key)

# Define the messages for the chat
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What's in this image?"
            },
            {
                "type": "image_url",
                "image_url": f"data:image/jpeg;base64,{base64_image}" 
            }
        ]
    }
]

# Get the chat response
chat_response = client.chat.complete(
    model=model,
    messages=messages
)

# Print the content of the response
print(chat_response.choices[0].message.content)

import { Mistral } from "@mistralai/mistralai";
import fs from 'fs';

async function encodeImage(imagePath) {
    try {
        // Read the image file as a buffer
        const imageBuffer = fs.readFileSync(imagePath);

        // Convert the buffer to a Base64-encoded string
        const base64Image = imageBuffer.toString('base64');
        return base64Image;
    } catch (error) {
        console.error(`Error: ${error}`);
        return null;
    }
}

// Path to your image
const imagePath = "path_to_your_image.jpg"

// Getting the base64 string
const base64Image = await encodeImage(imagePath)

const apiKey = process.env["MISTRAL_API_KEY"];

const client = new Mistral({ apiKey: apiKey });

const chatResponse = await client.chat.complete({
  model: "pixtral-12b",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          imageUrl: f"data:image/jpeg;base64," + base64Image,
        },
      ],
    },
  ],
});

curl https://api.mistral.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -d '{
    "model": "pixtral-12b-2409",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What’s in this image?"
          },
          {
            "type": "image_url",
            "image_url": "data:image/jpeg;base64,<base64_image>"
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

用例

理解图表

curl https://api.mistral.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -d '{
    "model": "pixtral-12b-2409",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What’s in this image?"
          },
          {
            "type": "image_url",
            "image_url": "https://cdn.statcdn.com/Infographic/images/normal/30322.jpeg"
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

模型输出

The chart is a bar chart titled 'France's Social Divide,' comparing socio-economic indicators between disadvantaged areas and the whole of France. It comprises two sections: the first section includes three bar groups representing the percentage of people part of the working-class, unemployment rate, and percentage of 16-25-year-olds not in school and unemployed. The second section includes three bar groups representing median monthly income, poverty rate, and households living in overcrowded housing. Each bar group contains two bars: one for disadvantaged areas (red) and one for the whole of France (blue). The data indicate that disadvantaged areas have higher percentages of working-class individuals (33.5% vs. 14.5%), unemployment (18.1% vs. 7.3%), and young people not in school and unemployed (25.2% vs. 12.9%). They also show a lower median monthly income (€1,168 vs. €1,822), a higher poverty rate (43.3% vs. 15.5%), and a higher percentage of households living in overcrowded housing (22.0% vs. 8.7%). The chart highlights significant disparities in socio-economic conditions between disadvantaged areas and the rest of France, emphasizing the challenges faced by these communities.

比较图像

curl https://api.mistral.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -d '{
    "model": "pixtral-12b-2409",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "what are the differences between two images?"
          },
          {
            "type": "image_url",
            "image_url": "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://assets.visitorscoverage.com/production/wp-content/uploads/2024/04/AdobeStock_626542468-min-1024x683.jpeg"
            }
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

模型输出

The first image features the Eiffel Tower surrounded by snow-covered trees and pathways, with a clear view of the tower's intricate iron lattice structure. The second image shows the Eiffel Tower in the background of a large, outdoor stadium filled with spectators, with a red tennis court in the center. The most notable differences are the setting - one is a winter scene with snow, while the other is a summer scene with a crowd at a sporting event. The mood of the first image is serene and quiet, whereas the second image conveys a lively and energetic atmosphere. These differences highlight the versatility of the Eiffel Tower as a landmark that can be enjoyed in various contexts and seasons.

转录收据

curl https://api.mistral.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -d '{
    "model": "pixtral-12b-2409",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "transcribe this receipt"
          },
          {
            "type": "image_url",
            "image_url": "https://www.boredpanda.com/blog/wp-content/uploads/2022/11/interesting-receipts-102-6364c8d181c6a__700.jpg"
          }
        ]
      }
    ]
  }'

模型输出

\nDine-In\n\nReceipt Details\nDate: 02-Apr-2022\nTime: 5:01:56 PM\nCashier: Raul\n\nItems Purchased:\n1 Empanada - Beef         $3.00\n1 Empanada - Cheese       $3.00\n1 Empanada - Chicken      $3.00\n1 Tallarin Huancaina Lomo Saltado  $19.99\n1 1/2 Pisco Sour          $15.00\n\nSubtotal                   $43.99\nLocal Taxes (5.5%)        $2.42\nTotal                     $46.41\n\nMessage: IMMIGRANTS MAKE AMERICA GREAT THEY ALSO COOKED YOUR FOOD AND SERVED YOU TODAY GOD BLESS YOU\n\nOrder ID: D0BQZ3R656MDC\n\nLinks:\n- Online Ordering: https://clover.com/r/D0BQZ3R656MDC\n- Clover Privacy Policy: https://clover.com/privacy\n

转录旧文档

curl https://api.mistral.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -d '{
    "model": "pixtral-12b-2409",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "transcribe this"
          },
          {
            "type": "image_url",
            "image_url": "https://ciir.cs.umass.edu/irdemo/hw-demo/page_example.jpg"
          }
        ]
      }
    ]
  }'

模型输出

# Letters Orders and Instructions December 1855\n\n**Hoag's Company, if any opportunity offers.**\n\nYou are to be particularly exact and careful in these pagineries, that there is no disgrace meet between the Returns and you Pay Roll, or those who will be strict examining into it hereafter.\n\nI am & c.\n\n*[Signed]*\nEff.

具有结构化输出的 OCR

curl https://api.mistral.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -d '{
    "model": "pixtral-12b-2409",
    "messages": [
            {
                "role": "system",
                "content": [
                    {"type": "text",
                     "text" : "Extract the text elements described by the user from the picture, and return the result formatted as a json in the following format : {name_of_element : [value]}"
                    }
                ]
            },
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "From this restaurant bill, extract the bill number, item names and associated prices, and total price and return it as a string in a Json object"
                    },
                    {
                        "type": "image_url",
                        "image_url": "https://i.imghippo.com/files/kgXi81726851246.jpg"
                    }
                ]
            }
        ],
    "response_format": 
      {
        "type": "json_object"
      }
  }'

模型输出

{'bill_number': '566548',
 'items': [{'item_name': 'BURGER - MED RARE', 'price': 10},
  {'item_name': 'WH/SUB POUTINE', 'price': 2},
  {'item_name': 'BURGER - MED RARE', 'price': 10},
  {'item_name': 'WH/SUB BSL - MUSH', 'price': 4},
  {'item_name': 'BURGER - MED WELL', 'price': 10},
  {'item_name': 'WH BREAD/NO ONION', 'price': 2},
  {'item_name': 'SUB POUTINE - MUSH', 'price': 2},
  {'item_name': 'CHK PESTO/BR', 'price': 9},
  {'item_name': 'SUB POUTINE', 'price': 2},
  {'item_name': 'SPEC OMELET/BR', 'price': 9},
  {'item_name': 'SUB POUTINE', 'price': 2},
  {'item_name': 'BSL', 'price': 8}],
 'total_price': 68}

常见问题

每张图片的价格是多少？

价格按照输入 token 的定价计算。

Pixtral

对于这两种 Pixtral 模型，每张图像将被分割成 16x16 像素的批次，每个批次转换为一个 token。根据经验法则，分辨率为“ResolutionX”x“ResolutionY”的图像将消耗大约 (ResolutionX/16) * (ResolutionY/16) 个 token。例如，一张 720x512 图像将消耗大约 (720/16) * (512/16) ≈ 1440 个 token。请注意，所有分辨率高于 1024x1024 的图像将在保持相同纵横比的情况下进行缩小。例如，一张 1436x962 图像将被缩小到大约 1024x686，消耗大约 (1024/16) * (686/16) ≈ 2600 个 token。

最终公式：Token 数量 ≈ (ResolutionX * ResolutionY) / 256

Small / Medium

Small 模型类似；然而，它不是使用 16 像素的批次，而是使用 14 像素的批次。最大分辨率不是 1024x1024，而是 1540x1540。由于其架构略有不同，它还只使用该 token 数量的 1/4 作为文本解码器的输入。这意味着总的来说，您可以将消耗量大致概括为 (ResolutionX/14) * (ResolutionY/14) * 1/4，这大约比 Pixtral 模型少 3 倍，因此使用更少的 token 且更高效。

最终公式：Token 数量 ≈ (ResolutionX * ResolutionY) / 784
我能对图像能力进行微调吗？

目前不支持对图像能力进行微调。
我能用它们生成图像吗？

不能，它们旨在理解和分析图像，而不是生成图像。
支持哪些类型的图像文件？

目前支持以下图像格式
- PNG (.png)
- JPEG (.jpeg 和 .jpg)
- WEBP (.webp)
- 仅包含一帧的非动画 GIF (.gif)
图像大小有限制吗？

当前文件大小限制为 10Mb。
每次请求的最大图像数量是多少？

通过 API 每次请求的最大图像数量是 8 张。
速率限制是多少？

有关速率限制的信息，请访问 https://console.mistral.ai/limits/。

具备视觉能力的模型：​

传递图像 URL​

传递 Base64 编码图像​

用例​

常见问题​

具备视觉能力的模型：

传递图像 URL

传递 Base64 编码图像

用例

常见问题