This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Appendix

A Further Discussion on Set Operators

Based on the rules discussed on Section 3.4.1, NatSQL can simplify the SQL with INTERSECT (example is shown in Figure 1) and EXCEPT. As to the case that the set operator itself represents part of a condition, NatSQL allows them to follow the WHERE keyword. As illustrated in Table 1, this type of SQL is mainly related to the EXCEPT operator.

The prediction NatSQL prediction work in Table 1 is easier than others. NatSQL here only needs to predict the ‘cartoon’ table, instead of predicting the ‘cartoon.channel’ column. Predicting a table is easier than predicting a column because the premise of finding the correct column is to find the correct table. Besides, many models output ‘cartoon.id’ instead of ‘cartoon.channel’ because the annotation of ‘cartoon.id’ is the same as‘tv_channel.id’ column.

Ques Find the id of tv channels that do not play any cartoon
SQL SELECT id FROM tv_channel EXCEPT
SELECT channel FROM cartoon
SemQL SELECT tv_channel.id EXCEPT
SELECT cartoon.channel
NatSQL SELECT tv_channel.id WHERE except cartoon.*
Table 1: An example of none WHERE conditions before the IUE.

In addition to the conditions mentioned in Section 3.4.1 that cannot be concatenated, Table 2 present one more example. They can not concatenate because one WHERE condition can not concatenate a HAVING condition by a OR operator.

Ques Which film is rented at a fee of 0.99 or has less
than 3 in the inventory?
SemQL SELECT film.title WHERE film.rental_rate = 0.99
UNION
SELECT film.title WHERE count(inventory.*)<< 3
NatSQL SELECT film.title WHERE film.rental_rate = 0.99
OR count(inventory.*)<< 3
Table 2: An example modified from that in Figure 5.

B Further Discussion on Executable SQL Generation

In Section 3.6, we discuss the different questions in Figure 5 generate different NatSQL, where training data is the key. Firstly, in the dataset, for SQL with multiple WHERE conditions, the order of the conditions is mostly consistent with the question. Secondly, the NatSQL further expands this type of training data. For example, the NatSQL queries in Figure 1,3,4 contain multiple WHERE conditions, which appear in the order they are mentioned. These training data make it possible for models to generate different NatSQL according to the different questions in Figure 5.

C Gold NatSQL Error Analysis

Table 3 presents the F1 score of NatSQL for different SQL components. We observe that the main errors come from GROUP BY and IUE matching. Although NatSQL cannot be converted to all gold GROUP BY clauses, most of these errors don’t affect the execution results. The IUE errors occur because NatSQL only supports one IUE operator per query.

Component F1 Component F1
select 0.997 where 0.969
group 0.879 order 0.996
and/or 0.998 IUE 0.900
keywords 0.989
Table 3: Partial matching F1 score of NatSQL on the Spider development set.

Some other errors are due to the limitation of the exact match evaluation method when evaluating the JOIN ON clause of subqueries and sub-subqueries. Specifically, when the FROM and JOIN in a generated subquery is not identical to the gold SQL, the Spider evaluation scheme considers it to be wrong. For example, the following two SQL statements have the same semantic meaning, but they are recognized as different by the Spider exact match evaluation method, thus results in an exact match error.
…  col in ( SELECT col FROM T1 JOIN T2 … )
…  col in ( SELECT col FROM T2 JOIN T1 … )

D SQL, SemQL and NatSQL Examples

We present more examples in Table 4.

Ques: What are the name of the countries where there is not a single car maker?
SQL: SELECT CountryName FROM countries EXCEPT SELECT T1.CountryName FROM countries AS T1 JOIN car_makers AS T2 ON T1.countryId = T2.Country;
SemQL: Not Support
NatSQL: SELECT countries.countryname WHERE except @ is car_makers.*
Ques: Find the last name of the students who currently live in the state of North Carolina but have not registered in any degree program.
SQL: SELECT T1.staff_name FROM staff AS T1 JOIN Staff_DA AS T2 ON T1.staff_id = T2.staff_id WHERE T2.job_title_code = ”Sales Person” EXCEPT
SELECT T1.staff_name FROM staff AS T1 JOIN Staff_DA AS T2 ON T1.staff_id = T2.staff_id WHERE T2.job_title_code = ”Clerical Staff”
SemQL: SELECT staff.staff_name WHERE Staff_DA.job_title_code = ”Sales Person” EXCEPT
SELECT staff.staff_name WHERE Staff_DA.job_title_code = ”Clerical Staff”
NatSQL: SELECT staff.staff_name WHERE Staff_DA.job_title_code = ”Sales Person”
AND Staff_DA.job_title_code != ”Clerical Staff”
Ques: Find id of the tv channels that from the countries where have more than two tv channels.
SQL: SELECT id FROM tv_channel GROUP BY country HAVING count(*) >> 2
SemQL: SELECT tv_channel.id WHERE count ( tv_channel.* ) >> 2
NatSQL: SELECT tv_channel.id WHERE count ( tv_channel.* ) >> 2
Ques: List all song names by singers above the average age.
SQL: SELECT song_name FROM singer WHERE age >> ( SELECT avg(age) FROM singer )
SemQL: SELECT singer.song_name WHERE singer.age >> ( SELECT avg(singer.age) )
NatSQL: SELECT singer.song_name singer WHERE @ >> avg ( age )
Ques: Which district has both stores with less than 3000 products and stores with more than 10000 products?
SQL: SELECT district FROM shop WHERE Number_products << 3000 INTERSECT SELECT district FROM shop WHERE Number_products >> 10000
SemQL: SELECT shop.district WHERE shop.Number_products << 3000 INTERSECT SELECT shop.district WHERE shop.Number_products >> 10000
NatSQL: SELECT shop.district WHERE shop.number_products << 3000 and shop.number_products >> 10000
Table 4: SQL, SemQL and NatSQL examples from the Spider.