ollama環境で動画を読み込むことができる無料のLLMを使って内容を把握する方法

2026年4月30日2026年5月4日

はじめに

、Vision Language Models (VLM) と呼ばれる、画像とテキストを同時に理解できるマルチモーダルAI GPT-4o（OpenAI社）、LLaVA-NeXT-Video（ByteDance社）、Gemini 1.5 Pro（GoogleなどありますがInternVL・LLaVA-Video・Video-LLaMA GLM-4.6V-Flash Qwen2.5-VLなどのオープンソースモデルも使うことができます

モデル

ollama run blaifa/InternVL3_5:4B
ollama run ManishThota/llava_next_video
ollama run qwen2.5vl
ollama run gurubot/GLM-4.6V-Flash-GGUF:Q4_K_M

動画を分解して理解する Python プログラム

インストール

pip install ollama

プログラム

import os
import subprocess
import ollama
from pathlib import Path

FFMPEG_PATH = r"D:\WinPython\ffmpeg\bin\ffmpeg.exe"
TEMP_DIR = r"Y:\temp_frames"
INPUT_DIR = r"Y:\gif"
OUTPUT_DIR = r"Y:\output"
MODEL_NAME = "blaifa/InternVL3_5:4B"


# ===== フレーム抽出（GIF / MP4 共通） =====
def extract_frames(video_path):
    os.makedirs(TEMP_DIR, exist_ok=True)

    # 既存フレーム削除
    for f in Path(TEMP_DIR).glob("*.jpg"):
        f.unlink()

    cmd = [
        FFMPEG_PATH,
        "-i", video_path,
        os.path.join(TEMP_DIR, "frame_%06d.jpg")
    ]

    print("ffmpeg 実行:", " ".join(cmd))
    subprocess.run(cmd)
    print("フレーム抽出完了:", TEMP_DIR)

    frames = sorted(str(p) for p in Path(TEMP_DIR).glob("*.jpg"))
    return frames


# ===== InternVL3.5 で解析 =====
def analyze_images(image_paths, lang="ja"):
    if lang == "ja":
        prompt = """
あなたは映像解析の専門家です。
以下の画像は動画またはアニメーションの全フレームです。

【重要】
- 日本語で非常に詳しく説明する
- 背景（場所・環境・状況）を細かく描写
- 登場人物（性別・服装・髪型・表情・動作）を詳細に描写
- 物体（家具・道具・乗り物など）を詳しく説明
- 時系列で「何が起きているか」を丁寧に説明
- 内部思考（<think>）は絶対に出力しない
"""
    else:
        prompt = """
You are an expert in video scene analysis.
The following images are all frames of a video or animation.

Requirements:
- Describe everything in detailed English
- Describe background, environment, lighting, atmosphere
- Describe people: gender, clothing, hairstyle, expression, actions
- Describe objects and surroundings
- Explain the timeline of actions step-by-step
- Do NOT output chain-of-thought or internal reasoning
"""

    res = ollama.chat(
        model=MODEL_NAME,
        messages=[
            {
                "role": "user",
                "content": prompt,
                "images": image_paths
            }
        ]
    )

    return res["message"]["content"]


# ===== 入力ファイルを自動判定して解析 =====
def process_file(path):
    ext = Path(path).suffix.lower()
    base = Path(path).stem

    print(f"\n=== 処理開始: {path} ===")

    # GIF / MP4 → フレーム抽出
    if ext in [".gif", ".mp4"]:
        print("アニメーションまたは動画として処理します。")
        frames = extract_frames(path)

    # JPG / PNG → そのまま
    elif ext in [".jpg", ".jpeg", ".png"]:
        print("静止画として処理します。")
        frames = [path]

    else:
        print(f"対応していない形式: {ext}")
        return

    # 出力フォルダ作成
    os.makedirs(OUTPUT_DIR, exist_ok=True)

    # 日本語解析
    ja_text = analyze_images(frames, lang="ja")
    with open(Path(OUTPUT_DIR) / f"{base}_ja.txt", "w", encoding="utf-8") as f:
        f.write(ja_text)

    # 英語解析
    en_text = analyze_images(frames, lang="en")
    with open(Path(OUTPUT_DIR) / f"{base}_en.txt", "w", encoding="utf-8") as f:
        f.write(en_text)

    print(f"出力完了: {base}_ja.txt / {base}_en.txt")


# ===== メイン =====
if __name__ == "__main__":
    files = list(Path(INPUT_DIR).glob("*.*"))

    print(f"対象ファイル数: {len(files)}")

    for f in files:
        process_file(str(f))

    print("\n=== 全ファイル処理完了 ===")

ollama

Posted by eightban