CHESS的原理
CHESS论文:[2405.16755] CHESS: Contextual Harnessing for Efficient SQL Synthesis
CHESS 的实现过程
(1)Information Retriever (IR) - 信息检索器
从数据库模式中提取与问题相关的实体和上下文。
The Information Retriever (IR) agent aims to retrieve the relevant entities (values in the database) and context (schema descriptions provided in the database catalog). To achieve this, we present scalable and efficient methods using locality-sensitive hashing to retrieve database values from millions of rows, leveraging keyword detection, and vector databases to extract contextual information from database catalogs. Our approach utilizes both semantic and syntactic similarities between the database content and the user’s query to enhance the retrieval quality.
(2)Schema Selector (SS) - 模式选择器
进一步筛选出与问题最相关的表和列。
The goal of the Schema Selector (SS) agent is to reduce the schema size by selecting only the necessary tables and columns required for generating the SQLquery. To achieve this, the SS agent is equipped with three tools, filter column, select tables, and select columns.
(3)Candidate Generator (CG) - 候选生成器
生成多个候选 SQL 查询。
The Candidate Generator (CG) is responsible for synthesizing SQL query that answers the question asked from the database. To accomplish this, the CG agent first calls the generate candidate query tool to generate candidate queries. It then executes these candidates on the database, checks the result, and identifies any faulty queries (those containing syntactic errors or empty result). For each faulty candidate, the agent repeatedly attempts to revise the query, until the issue is resolved or a maximum number of allowed revisions is reached.
(4)Unit Tester (UT) - 单元测试器
功能:通过自然语言单元测试评估生成的 SQL 查询。
The Unit Tester (UT) agent is responsible for selecting the most accurate SQL query from the pool of candidates generated by the CG agent. UT identifies the best candidate by 1) generating multiple unit tests that highlight differences between the candidate queries and 2) evaluating the candidates against these unit tests. It then assigns a score to each query based on the number of unit tests it passes, selecting the top-scoring candidate as the final SQL query for the given question.
实例分析
以california school 0为例,在bird测试集中:
|
|
- “question_id"是测试集问题编号,该测试集一共有1534个问题
- “db_id"是数据库名称
- “evidence"是给AI的提示,用于辅助生成结果
- “SQL"是标准查询语句,用于获取正确的查询结果
- “difficulty"是BIRD给出的问题难度评级,“simple”<“moderate”<“challenging”
通过california_school_0.json
与california_school_0.log
逐级分析:
首先用户给出提示词,在log中有记录:
|
|
AI通过分析用户的要求,得到如下结果:
|
|
To find the highest eligible free rate for K-12 students in schools located in Alameda County, we need to calculate the ratio of
Free Meal Count (K-12)
toEnrollment (K-12)
for each school and then determine the maximum value of this ratio.The database schema indicates that these fields are located in the
frpm
table, and the county information is stored in theCounty Name
column.The query filters records where
County Name
is ‘Alameda’ to focus on schools in Alameda County.The calculation for the eligible free rate is performed by dividing
Free Meal Count (K-12)
byEnrollment (K-12)
, and the maximum value of this calculation is obtained using the MAX function.The query correctly handles the division by casting
Free Meal Count (K-12)
to REAL to ensure accurate division results.The query also ensures that only the maximum value of the calculated ratio is returned, adhering to the instruction to output only the information asked in the question.
AI给出的结果写入到了revised_SQL中:
|
|
同一个问题会被询问2-3次,之后程序对进行评判回答是否正确:
|
|
其中:
- “GOLD_SQL"是在BIRD数据集中出现的标准答案
- “PREDICTED_SQL"是AI产生的答案
- “exec_res"是判断结果,如果结果正确则返回1,错误则返回0
- “exec_err"是产生错误的原因,如果结果正确则返回”–”
- “evaluation_for"是状态,
evaluation_for: nl_to_sql
表示当前评估的目标是检查自然语言到 SQL 转换的准确性;evaluation_for: revise_sql
表示当前评估的目标是检查对原始 SQL 查询的修正是否正确。
下面是一个出现错误但最后被修正的日志:
|
|
在所给的error_logs
中,出错的回答日志被筛选出,可以通过检索**chain_of_thought_reasoning
**查询思维链从而尝试找出失误的原因。
此外,论文从29页开始,给出了一些错误回答与错误原因:
failed_summary.json
CHESS项目给出了他们在运行BIRD测试集时产生的中间结果,存放在CHESS/results/chess_on_dev.zip at chess-v1 · ShayanTalaei/CHESS,经过合并,所有产生失误的数据被单独筛选出来,放入failed_summary.json
中。
为了方便查找与定位,所有产生失误的数据都被添加了单独的标签failure_annotations
,可以快速查找标签以定位问题产生的原因。
如:
|
|
产生了"missing_column"问题
又如:
|
|
在"nl_to_sql"环节发生错误。