LLM (Large Language Model) 大语言模型（主条目）

LLM (Large Language Model) 大语言模型（主条目）

2024-07-29. Category & Tags: AIGC, GPT, ChatGPT, LLM, Large Language Model, 大语言模型, 语言模型

See also (all LLM related posts & content):

/llm-benchmark: LLM Model benchmarks metrics & leaderboards
AI-Tools for online/handy AI tools for specific purposes, e.g. PowerPoint Slides, text2pic 文生图/视频平台, text2music 文生音乐平台, 数字人/虚拟主播平台 etc.
MCP for MCP servers’ aggregation platforms (MCP 聚合平台网站) (inc. Google A2A).
routers (model providers):
- OpenRouter.ai
- gemini 2.5 pro: free account 50~1000 requests/day
- requesty.ai
- gemini 2.5 pro: no limit
- SiliconFlow.cn 硅基流动
API translator:
- NewAPI
- OneAPI
backends:
- vLLM
- /llmflow: LLM WorkFlow
- /rag-agent-frameworks
- /chatglm
- /fastchat-vicuna
- /llamaindex
- llama factory
- /ollama
- llama.cpp
Deployment framework 大模型部署对比： Ollama vs. vLLM vs.LMDeploy
frontends:
- HuggingChat
- GPT4ALL
- LibreChat
- /open-webui
- localAI
datasets etc.
- Chinese NLP Data: 四大名著现代汉语版、古汉语版
multi-agent frameworks (below).

Interesting APPs #

Some simple examples/demos.

语音聊天机器人：【Open WebUI+Ollama/vLLM+CosyVoice+Whisper】终极个人聊天互动机器人-环境部署及成果展示
简单多模态：ollama+open-webui_知识库+多模态+文生图功能详解

Fine-tune vs. RAG vs. Prompt #

Fine-tune ≈ Learn a course
RAG ≈ open-book examination
Prompt ≈ ?

Several Tools to Run the LLM’s Model (itself) #

本地大模型启动 openai 服务的 N 种方式, vllm, fastchat, llama factory, llama.cpp, ollama

Chain of Thought / 思维链 #

狭义: Chain of Thought = COT
广义: 3 种: Chain of Thought = COT; Tree of Thought = TOT; Graph of Thought = GOT. 对应数据结构分别是: list, tree, graph.

RAG (Retrieval-Augmented Generation) Basics #

Typical RAG Question-answering System with Local-knowledgebase #

ref: 从基础 RAG 到 Agent: Llamaindex 助力大模型应用落地的思考与实践，稀土开发者大会 2024.6

typical workflow (steps/modules): #

Data Parsing & Input
Search

naive RAG is NOT good at: #

Summarization / 综述总结
Comparison / 对比分析
Implicit Data / 含蓄暗示（间接提示）
Multi-part Questions / 多分段综合问题

due to (Reasons):

single-short / 单回合
no plan / 无 Query 理解、步骤规划，或低级理解和规划
no tools / 无工具调用
no reflection / 无反思
no memory / 无(上下文)记忆

thus improvements (Agentic RAG / 主动式 RAG):

use external tools
add reflection
multi-turn / 多回合 (多次循环或者按需循环，直到满意或者达到最大次数)
add query 理解/规划层 (逐步给出足够复杂的解决方案)
add memory

Define: Agentic RAG (which has 4 components/steps): #

Reflect / 反思【相对成熟】
Tool Use / 工具使用【相对成熟】
Routing / 路由
Conversation Memory / 对话记忆
Query Planning / 规划

Agent #

See also:

langchain-ai/langgraph : langgraph examples, 2024 updating…, many, with videos on youtube
- Tool Use - Simple Data Analyst Agent with Cohere and Langchain , with videos on youtube, 2024.4
- Building a LangGraph ReAct Mini Agent: reasoning graph should NOT be complicated.
- (从官方翻译而来)手把手带你搭建 Agent 智能体！从零到一超详细原理微调讲解+代码解析项目实战，毛毛虫都能学清楚！—RAG,prompt,微调，Agent. (2.5h。重点看 React 框架如何解决问题，忽略 P6-Memroy 中的关于人的各种记忆的随意定义。三连评论换课件。)
Microsoft MS AutoGen examples, official, agent-focus
CrewAI examples, official, few

Sunny Define Agent: 基于 LLM 的代理人。

Define: Reflect / 反思 【相对成熟】

Agent 检查、评估自己的工作结果，并提出改进的方法. （可以有效提高 LLM 生成的内容质量）

Define: Tool Use / 工具使用 (to generate correct input for tools w.r.t. Query, e.g.) 【相对成熟】

e.g. 1: (Define Auto-Retrieval: ) convert query to keys which are used as Vector DB filters
e.g. 2: Text-to-SQL as SQL DB input
e.g. 3: generate api calls

Define: Routing / 路由 (which has 2 components):

sematic search / 语义搜索 by Vector Query which retrives top-k results
summarization / 归纳总结

Define: Conversation Memory / 对话记忆:

not only keeping conversation history
but also how to fuse info when the history is larger than input context window (compress, search etc.)

Define: Query Planning / 规划 【早期发展阶段】 (e.g. compare 2 companies income increment) :

Agent 对用户的目标进行拆解并执行（比如一篇分析报告，拆分为提纲、分段撰写、总结小标题等，但未必和用户既定路线完全一致）。通过连接工作流中不同的工具节点实现任务的精细编排和执行，编排难度较大， 能力上限较高，确定性较高。
e.g. plan: company A’s income & increment; company B’s income & increment; comparison

RAG 进阶策略 ( choose considering metrics ):

pic ref: 从基础 RAG 到 Agent: Llamaindex 助力大模型应用落地的思考与实践，稀土开发者大会 2024.6

Define: Multi-Agent Collaboration / 协作【早期发展阶段】：单个 agent 处理大量目标/子任务是超过 agent 的能力的，可以多个 AI Agent 协同工作，分工任务，讨论和辩论想法，提出比单个智能体更好的解决方案。需要关注复杂任务中的专家角色，而无需精确设计流程和协作关系，实现了对复杂任务的分支处理， 编排难度较小，结果的上限较高，但是不确定性较高。例如长文生成、逻辑话题等。

专 agent 专项任务
并行
还要考虑成本和响应时间

Define Llama Agents Framework 产品 (一个模块化的面向服务的分布式架构): 通过 “Control Plane” (包括: Orchestrator & Service Metadata) 生成各 Agent 的调用，通过 “Message Queue” 发送给对应的各个 agent, 比起 autogen、crewai 等，引入了人的反馈：

RAG 未来展望（LlamaIndex）：可观、可控、可定制。

See also: LlamaHub.ai for RAG components (mainly LlamaParse for Data Parsing & Input. Note: LlamaParse is online and requires api-key).

Frameworks Comparison #

Aspect	LangChain	~~LlamaIndex~~	Hugging Face Transformers	GPT-Index	DeepPavlov	PromptLayer	CrewAI
Prompt Engineering	Extensive support, custom prompt templates	Basic prompt management	Advanced prompt engineering capabilities	Focus on integrating prompts with index retrieval	Flexible prompt support	Advanced prompt engineering capabilities	Advanced prompt support and customization
Data Retrieval and Integration	Robust integration with various data sources	Focus on indexing and retrieval from multiple sources	Integration via datasets and APIs	Emphasizes retrieval-augmented generation (RAG)	Integration with various data sources and formats	Emphasis on data integration with prompt execution	Comprehensive data retrieval and integration capabilities
Model Orchestration and Chaining	Strong orchestration for complex workflows	Limited chaining capabilities	Orchestration through pipelines and workflows	Focuses on indexing and retrieval chains	Supports chaining and workflow management	Workflow management with observability features	Robust orchestration and chaining for complex tasks
Debugging and Observability	Good debugging tools, extensive logging capabilities	Basic logging and monitoring	Advanced logging and model monitoring	Debugging focused on retrieval issues	Logging and error handling capabilities	Integrated observability with real-time logging	Advanced debugging and observability tools
LLM Applications (RAGs)	Strong support for Retrieval-Augmented Generation	Basic support for RAGs	Supports RAGs through pipelines	RAGs are a core feature	Supports RAGs	Integrated RAG support with advanced features	Comprehensive support for RAGs and similar applications
Evaluations	Tools for evaluating prompt effectiveness	Basic evaluation capabilities	Comprehensive evaluation tools and metrics	Evaluation focuses on indexing accuracy	Evaluation tools for model performance	Detailed evaluation tools and metrics	Detailed evaluation tools and metrics
Production Readiness	Well-suited for production with enterprise features	Suitable for production with indexing capabilities	Production-ready with robust API support	Designed for production with focus on indexing	Production-ready with enterprise support	Well-suited for production with enterprise capabilities	Well-suited for production with enterprise features
Ecosystems and Integrations	Strong ecosystem with various integrations	~~Limited integrations, focus on indexing~~	Extensive ecosystem with numerous integrations	Focused integrations with RAG-centric tools	Good ecosystem with various integrations	Ecosystem includes integration with observability tools	Comprehensive ecosystem and integrations
Support (Documents, Tutorials, Community)	Extensive documentation, strong community support	~~Basic documentation~~, growing community	Comprehensive documentation, extensive community	Good documentation, active community	Detailed documentation, community support	Good documentation, community support, tutorials	Comprehensive documentation, strong community support

Warning: Crew AI collects anonymized usage data/info and reports to telemetry.crewai.com. Solution: Disable by faking it in OpenTelemetry lib.

References:

4o mini
LangChain vs. Alternatives:
Hugging Face Transformers:
- Hugging Face Transformers Overview
GPT-Index:
- GPT-Index GitHub
DeepPavlov:
- DeepPavlov Documentation
PromptLayer:
- PromptLayer Documentation
CrewAI:
- CrewAI Documentation

Frameworks in Details #

for fun & basic usage #

ChatBot Ollama

for development #

AutoGPT

MetaGPT

LangChain

Autogen Studio by MicroSoft

GraphRAG neo4j presentation in graphics

LLM for Scholar #

~~Elicit~~ ref ~~OpenRead~~ ref

LLM + Graph + Simple Local (English) #

MS GraphRAG + Ollama 本地部署

# 拉取quantinz模型
ollama pull quentinz/bge-base-zh-v1.5:latest

# 拉取gemma模型
ollama run gemma2:9b

# 展示模型列表
ollama list

Main refs:

Other refs:

GraphRAG+Ollama 实现本地部署: using llm model mistral + embedding model nomic-embed-text
medium: GraphRAG local setup via vLLM and Ollama : A detailed integration guide. using: Llama-3.1-8B + nomic-embed-text. (bak)
5 分钟手把手系列(二)：本地部署 Graphrag（Pycharm+Ollama+LM Studio）

faq #

Problem: “not answering in json” related problem. Hot (bypass) fix:

In graphrag/llm/openai/utils.py, replace result = json.loads(input) with:

        result_list = input.replace('** ',' ').split('* **')[1:-1]
        result = dict({'points': []})
        for one_result in result_list:
            result['points'].append({'description': one_result, 'score': 85})
        input = json.dumps(result)

LLM + Graph + Simple Local (Chinese) #

Main refs:

Airmomo/graphrag-practice-chinese

Other refs:

GraphRAG 架构和原理简介：GraphRAG 知识图谱检索增强生成

LLM + Graph + Local (+ Neo4j) #

构件图：使用 Neo4j 和 LangChain 实现“从本地到全局”的 GraphRAG
GraphRAG 部署流程及 Neo4j 展示》§ Neo4j 可视化, (bak)
ksachdeva/langchain-graphrag
YouTube Local GraphRAG + Langchain + local llm = Easy AI/Chat for your Docs

(Open-sourced/Free) Models, Agents, expecially for Coding #

VSCode + Cline (inc. Agent mode) #

Install Cline from VSCode official marketplace and click the extention icon to config AI: provider, api-key, model.

For SiliconFlow or other OpenAI-compatible APIs: Provider: “OpenAI Compatible”. Base URL: https://api.siliconflow.cn/v1 # NOTE: usually ends with v1 API Key: <my_own_key> Model: <select_a_model> Note, you need to specify the exact model’s name for compatible providers, as there is no list to select from. For Siliconflow, you can click to copy:

Then allow auto-approve of read, browser, MCP.

faq #

Problem: Shell 集成不可用 “Shell Integration Unavailable”（VSC 已是最新）。 Reason: ps < 7. Solution: https://zhuanlan.zhihu.com/p/25724740375 WARN: conda < 25 has a conflict with ps 7 “conda-script.py: error: argument COMMAND: invalid choice”. Update conda to solve it.

TODO #

Cinnamon/kotaemon: advantages: multi-model, docker, graph-display, customizable pipeline.

refs: