4.3.声明式编程

发表于 2024-05-31 更新于 2024-06-02 分类于 SICP in python ， Chapter 4.Data Processing 阅读次数：本文字数： 3.4k 阅读时长 ≈ 12 分钟

\(4.3\)声明式编程(\(declarative\;programming\))

一、概述

除了流(\(streams\))，数据的值也经常存储在一个叫作数据库(\(database\))的仓库里。

数据库由存入其中的数据和检索、转换这些数据的接口组成。每个存储在数据库中的值称作记录(\(record\))，有着相似结构的记录被归类为表格(\(tables\))。可以通过询问语言中的询问语句(\(queries\))来检索、转换记录。目前最普遍的询问语言被称作\(Structure\;Query\;Language(SQL)\)。

\(SQL\)是声明式编程语言的一种。在这种语言中，声明并不直接描述计算，而是描述计算想要得到的结果。而设计和运行一个计算过程来得到这个结果是询问解释器(\(query\;interpreter\))的工作。这种语言经常将过程性的细节(\(procedural\;details\))抽象，而专注于结果的形式(\(form\))。

二、\(SQL\)

1.表格

表格(\(table\))具有固定数量的命名列(\(named\;column\))和类型列(\(typed\;column\))。表的每一行表示一个数据记录，行中的每一列表示一个值。例如下面这个关于城市的表格：

一个只有一行的表格可以通过\(select\)语句创建。该行的每个值间用逗号隔开，列名称跟在关键词\(as\)后面。所有的\(SQL\)语句都以分号结尾：

1 2	sqlite> select 38 as latitude, 122 as longitude, "Berkeley" as name; 38\|122\|Berkeley

多行的表格可以通过\(union\)语句创建：

sqlite> select 38 as latitude, 122 as longitude, "Berkeley" as name union
   ...> select 42,             71,               "Cambridge"        union
   ...> select 45,             93,               "Minneapolis";
38|122|Berkeley
42|71|Cambridge
45|93|Minneapolis

可以通过\(create\;table\)语句来给表格命名：

sqlite> create table cities as
   ...>    select 38 as latitude, 122 as longitude, "Berkeley" as name union
   ...>    select 42,             71,               "Cambridge"        union
   ...>    select 45,             93,               "Minneapolis";

当一个表格被命名后，可以通过\(from\)语句调用这个名字。表格中的所有列可以通过select *形式访问：

sqlite> select * from cities;
38|122|Berkeley
42|71|Cambridge
45|93|Minneapolis

2.\(select\)语句

\(select\)语句通过列举(\(listing\))某一行的值，或者(更普遍地)从一个已存在的表格中，利用\(from\)句式投射(\(project\))来定义一个表格。该语句的一般格式如下：

1	select [column description] from [existing table name]

这里的\(column\;expression\)表示对原来表格的列的操作，如加减乘除等。

例如，我们可以用如下方法来创造一个两列的表格，这个表格描述了每个城市到\(Berkeley\)的距离：

sqlite> select name, 60*abs(latitude-38) from cities;
Berkeley|0
Cambridge|240
Minneapolis|420

为了方便后续被其它\(select\)语句所引用，在定义表格时给每一个列一个名字往往是必须的：

sqlite> create table distances as
   ...>   select name, 60*abs(latitude-38) as distance from cities;
sqlite> select distance/5, name from distances;
0|Berkeley
48|Cambridge
84|Minneapolis

\(a.\)\(Where\)语句

\(select\)语句也可以穿插一个表示过滤元素(\(filering\))的\(where\)语句。这个表达式将不符合\(where\)语句的行给筛序掉。只有符合筛选条件的行才会被用于构成一个新的表格：

sqlite> create table cold as
   ...>   select name from cities where latitude > 43;
sqlite> select name, "is cold!" from cold;
Minneapolis|is cold!

\(b.\)顺序语句

顺序语句(\(order\;clause\))包括一个排序表达式，所有未被过滤的行都会按照这个语句进行排序：

sqlite> select distance, name from distances order by -distance;
84|Minneapolis
48|Cambridge
0|Berkeley

\(select\)的这些特性功能使得\(select\)能够表示很多种类的行输入表格到输出表格的映射。

3.合并

\(a.\)合并的基本操作

众多数据可以通过合并(\(joining\))表格变成一个，这是数据库系统中一个基础的操作。

当表格被合并后，对于输入表格的每一个行，合并后的表格都会包括一个新的行。如果有两个表格进行合并操作，它们分别有\(m\)行与\(n\)行，那么合并后的表格就会有\(m\times n\)行。在\(SQL\)语言中，合并两个表格是通过\(from\)语句加上逗号隔开的表格名称实现的：

sqlite> create table temps as
   ...>   select "Berkeley" as city, 68 as temp union
   ...>   select "Chicago"         , 59         union
   ...>   select "Minneapolis"     , 55;

sqlite> select * from cities, temps;
38|122|Berkeley|Berkeley|68
38|122|Berkeley|Chicago|59
38|122|Berkeley|Minneapolis|55
42|71|Cambridge|Berkeley|68
42|71|Cambridge|Chicago|59
42|71|Cambridge|Minneapolis|55
45|93|Minneapolis|Berkeley|68
45|93|Minneapolis|Chicago|59
45|93|Minneapolis|Minneapolis|55

\(b.\)不同表格的关系

同样地，合并也可以通过\(where\)语句，表示两个表格间的关系。例如，我们希望将数据收集到一个表中，以获得每个城市自己的\(temp\)和\(lati\)。可以通过以下语句实现这个想法：

1
2
3

sqlite> select name, latitude, temp from cities, temps where name = city;
Berkeley|38|68
Minneapolis|45|55

\(c.\)消除歧义

为了消除名称相同的表格间的歧义，\(SQL\)允许我们使用关键字\(as\)为\(form\)语句提供别名，并利用点表达式来引用特定的表中的列。在下面计算不同城市温差的例子中，\(where\)语句的字母顺序约束确保了每个对在结果中只出现一次：

sqlite> select a.city, b.city, a.temp - b.temp
   ...>        from temps as a, temps as b where a.city < b.city;
Berkeley|Chicago|10
Berkeley|Minneapolis|15
Chicago|Minneapolis|5

三、\(SQL\)解释器

1.概括

为了创建一个\(SQL\)解释器，我们需要创造一个对表格的表示、对\(SQL\)语句的分析器，和对分析后语句的计算器。

在这个实现中，每个表格有自己的一个类，并且表格中的每行都由该类的一个实例来代表。一行的每一列都有一个属性，并且表格是列的序列。

2.\(table\)的创建

在\(python\)中，可以用\(namedtuple\)函数实现表格的类，它会返回一个新的\(tuple\)子类，这个子类会为元组中的每个元素给予名称。例如，先前的\(cities\)表格可以用如下\(python\)语句实现：

from collections import namedtuple
CitiesRow = namedtuple("Row", ["latitude", "longitude", "name"])
>>> cities = [CitiesRow(38, 122, "Berkeley"),
              CitiesRow(42,  71, "Cambridge"),
              CitiesRow(43,  93, "Minneapolis")]

3.\(select\)的实现

可以通过序列操作来实现\(select\)语句，例如先前生成的\(cities\)的\(distances\)的表格：

sqlite> create table distances as
   ...>   select name, 60*abs(latitude-38) as distance from cities;
sqlite> select distance/5, name from distances;
0|Berkeley
48|Cambridge
84|Minneapolis

这个表格是由\(cities\)的\(name\)列和\(latitude\)列生成的。该过程可以通过对输入表格的行用一个函数映射(\(mapping\;a\;function\))、然后返回一个新的行来实现：

>>> DistancesRow = namedtuple("Row", ["name", "distance"])
>>> def select(cities_row):
        latitude, longitude, name = cities_row
        return DistancesRow(name, 60*abs(latitude-38))
>>> distances = list(map(select, cities))
>>> for row in distances:
        print(row)
Row(name='Berkeley', distance=0)
Row(name='Cambridge', distance=240)
Row(name='Minneapolis', distance=300)

4.解释器的整体实现

\(SQL\)解释器的设计将上述方法普遍化了：

class Select:
        """select [columns] from [tables] where [condition] order by [order]."""
        def __init__(self, columns, tables, condition, order):
            self.columns = columns
            self.tables = tables
            self.condition = condition
            self.order = order
            self.make_row = create_make_row(self.columns)
        def execute(self, env):
            """Join, filter, sort, and map rows from tables to columns."""
            from_rows = join(self.tables, env)
            filtered_rows = filter(self.filter, from_rows)
            ordered_rows = self.sort(filtered_rows)
            return map(self.make_row, ordered_rows)
        def filter(self, row):
            if self.condition:
                return eval(self.condition, row)
            else:
                return True
        def sort(self, rows):
            if self.order:
                return sorted(rows, key=lambda r: eval(self.order, r))
            else:
                return rows

execute方法将输入的表格进行合并、过滤和排序。最后输出行调用的函数make_row是一个高阶函数，它为生成的表格创建了一个新的类，并且定义了如何从一个输入行(\(input\;row\))到一个输出行(\(output\;row\))：

def create_make_row(description):
        """Return a function from an input environment (dict) to an output row.
        description -- a comma-separated list of [expression] as [column name]
        """
        #seperate the string by commas and space
        columns = description.split(", ")
        expressions, names = [], []
        for column in columns:
        #if an 'as' is contained in the coloum,then divide it into expression and column name
            if " as " in column:
                expression, name = column.split(" as ")
         #else,view expression and name as the same
            else:
                expression, name = column, column
            expressions.append(expression)
            names.append(name)
        row = namedtuple("Row", names)
        #the lambda function output a named tuple,its value is received by calculating the expression in the current environment
        return lambda env: row(*[eval(e, env) for e in expressions])

最后，我们需要定义\(join\)函数。在提供env，包括由名字键(\(keyed\;by\;name\))控制的现有表格(以行列表的形式存在)，的情况下，\(join\)函数通过\(python\)的\(product\)函数，将这些行列表进行合并。对被合并的输出行，它用make_env函数映射，来得出对应字段的计算值：

from itertools import product
def join(tables, env):
        """Return an iterator over dictionaries from names to values in a row.
        tables -- a comma-separate sequences of table names
        env    -- a dictionary from global names to tables
        """
        names = tables.split(", ")
        joined_rows = product(*[env[name] for name in names])
        return map(lambda rows: make_env(rows, names), joined_rows)
>>> def make_env(rows, names):
        """Create an environment of names bound to values."""
        #transform a tuple into a dictionary
        env = dict(zip(names, rows))
        # add the name and its value into the dictionary
        for row in rows:
            for name in row._fields:
                env[name] = getattr(row, name)
        return env

上述的\(SQL\)解释器能够实现大部分\(SQL\)语句：

sqlite> select name, 60*abs(latitude-38) as distance
   ...>        from cities where name != "Berkeley" order by -longitude;
Minneapolis|420
Cambridge|240

>>> env = {"cities": cities}
>>> select = Select("name, 60*abs(latitude-38) as distance",
                    "cities", "name != 'Berkeley'", "-longitude")
>>> for row in select.execute(env):
        print(row)
Row(name='Minneapolis', distance=300)
Row(name='Cambridge', distance=240)

sqlite> select cities.name as name, distance, longitude
   ...>        from cities, distances where cities.name = distances.name;
Cambridge|240|71
Minneapolis|420|93

>>> env["distances"] = list(select.execute(env))
>>> joined = Select("cities.name as name, distance, longitude", "cities, distances",
                    "cities.name == distances.name", None)
>>> for row in joined.execute(env):
        print(row)
Row(name='Cambridge', distance=240, longitude=71)
Row(name='Minneapolis', distance=300, longitude=93)

四、递归\(select\)语句

1.\(with\)语句

\(with\)语句的格式如下：

1	with [tables] select [columns] from [names] where [condition] order by [order]

这里的\(tables\)部分是一个用逗号分隔开的列表，列表内是对表格的描述：

1	[table name]([column names]) as ([select statement])

任何\(select\)语句都可以用于描述表格。

下面的例子展示了\(with\)语句的使用：

sqlite> with
   ...>   states(city, state) as (
   ...>     select "Berkeley",  "California"    union
   ...>     select "Boston",    "Massachusetts" union
   ...>     select "Cambridge", "Massachusetts" union
   ...>     select "Chicago",   "Illinois"      union
   ...>     select "Pasadena",  "California"
   ...>   )
   ...> select a.city, b.city, a.state from states as a, states as b
   ...>        where a.state = b.state and a.city < b.city;
Berkeley|Pasadena|California
Boston|Cambridge|Massachusetts

在\(with\)语句中定义的表格可以包含一个简单的递归结构，通过其他的输出行来得出当前输出行。例如下面定义一个\(5\)到\(15\)间整数的表格：

sqlite> with
   ...>   ints(n) as (
   ...>     select 5 union
   ...>     select n+1 from ints where n < 15
   ...>   )
   ...> select n, n*n from ints where n % 2 = 1;
5|25
7|49
9|81
11|121
13|169
15|225

\(with\)语句还可以定义多重表格，表格间用逗号隔开。例如下面求解勾股数的表格：

sqlite> with
   ...>   ints(n) as (
   ...>     select 1 union 
            select n+1 from ints where n < 20
   ...>   ),
   ...>   squares(x, xx) as (
   ...>     select n, n*n from ints
   ...>   ),
   ...>   sum_of_squares(a, b, sum) as (
   ...>     select a.x, b.x, a.xx + b.xx
   ...>            from squares as a, squares as b where a.x < b.x
   ...>   )
   ...> select a, b, x from squares, sum_of_squares where sum = xx;
3|4|5
6|8|10
5|12|13
9|12|15
8|15|17
12|16|20

设计递归询问需要确保每个输入行都有足够的信息来计算结果行。例如下面求解斐波那契数的表格：

sqlite> with
   ...>   fib(previous, current) as (
   ...>     select 0, 1 union
   ...>     select current, previous+current from fib
   ...>     where current <= 100
   ...>   )
   ...> select previous from fib;
0
1
1
2
3
5
8
13
21
34
55
89

2.搭建字符串

两个字符串可以通过||操作符合并成一个更长的字符串：

sqlite> with wall(n) as (
  ....>   select 99 union select 98 union select 97
  ....> )
  ....> select n || " bottles" from wall;
99 bottles
98 bottles
97 bottles

这个操作可以用来通过短语来构建句子。例如，构建英语句子的一个方式是用主语、动词和宾语进行组合：

sqlite> create table nouns as
  ....>   select "the dog" as phrase union
  ....>   select "the cat"           union
  ....>   select "the bird";
sqlite> select subject.phrase || " chased " || object.phrase
  ....>        from nouns as subject, nouns as object
  ....>        where subject.phrase != object.phrase;
the bird chased the cat
the bird chased the dog
the cat chased the bird
the cat chased the dog
the dog chased the bird
the dog chased the cat

五、归并与分组

1.归并操作

\(select\)语句可以实现如\(max\)、\(min\)、\(count\)、\(sum\)等归并操作，它们分别返回行的最大最小值、行的数量和行内值的和：

sqlite> create table animals as
  ....>   select "dog" as name, 4 as legs, 20 as weight union
  ....>   select "cat"        , 4        , 10           union
  ....>   select "ferret"     , 4        , 10           union
  ....>   select "t-rex"      , 2        , 12000        union
  ....>   select "penguin"    , 2        , 10           union
  ....>   select "bird"       , 2        , 6;
sqlite> select max(legs) from animals;
4
sqlite> select sum(weight) from animals;
12056
sqlite> select min(legs), max(weight) from animals where name <> "t-rex";
2|20

count(*)语句用于计算对应量的行的数量：

sqlite> select count(legs) from animals;
6
sqlite> select count(*) from animals;
6
sqlite> select count(distinct legs) from animals;
2

2.\(group\;by\)和\(having\)语句

\(group\;by\)和\(having\)语句用于将表格的行分割为组的一个子集(\(subset\;of\;groups\))并只从这些子集中挑选一个。任何在\(having\)语句中的归并语句都会应用到表格的每一行，而不是一次性用于表格的所有行。例如下面的例子：

sqlite> select legs, max(weight) from animals group by legs;
2|12000
4|20
sqlite> select weight from animals group by weight having count(*)>1;
10

\(group\;by\)语句可以接受多个列，并将原来的表格按照这些列分组：

sqlite> select max(name) from animals group by legs, weight order by name;
bird
dog
ferret
penguin
t-rex
sqlite> select max(name), legs, weight from animals group by legs, weight
  ....>   having max(weight) < 100;
bird|2|6
penguin|2|10
ferret|4|10
dog|4|20
sqlite> select count(*), weight/legs from animals group by weight/legs;
2|2
1|3
2|5
1|6000

sqlite> select name, legs, max(weight) from animals group by legs;
t-rex|2|12000
dog|4|20

\(having\)语句可以和\(where\)语句一样过滤表格元素，还可以包含对归并元素的引用。可以按照如下原则选择两种语句：当过滤条件是对单一的行执行时使用\(where\)语句；当过滤条件设计归并操作时使用\(having\)语句。