Копирование событий с помощью clickhouse-copier

Для копирования событий с помощью утилиты clickhouse-copier необходимо создать файл с заданием на копирование task.xml и файл с подключением к ClickHouse-keeper (Zookeeper) client_config.xml.

task.xml
<clickhouse>
    <!-- Configuration of clusters as in an ordinary server config -->
    <remote_servers>
        <source_cluster>
            <shard>
                <internal_replication>true</internal_replication>
                    <replica>
                        <host>127.0.0.1</host>
                        <port>9000</port>

                        <user>komrad</user>
                        <password>pass</password>
                        <secure>0</secure>

                    </replica>
            </shard>
        </source_cluster>

        <destination_cluster>
            <shard>
                <internal_replication>true</internal_replication>
                    <replica>
                        <host>127.0.0.1</host>
                        <port>9000</port>

                        <user>komrad</user>
                        <password>pass</password>
                        <secure>0</secure>

                    </replica>
            </shard>
        </destination_cluster>
    </remote_servers>

    <!-- How many simultaneously active workers are possible. If you run more workers superfluous workers will sleep. -->
    <max_workers>2</max_workers>

    <!-- Setting used to fetch (pull) data from source cluster tables -->
    <settings_pull>
        <readonly>1</readonly>
    </settings_pull>

    <!-- Setting used to insert (push) data to destination cluster tables -->
    <settings_push>
        <readonly>0</readonly>
    </settings_push>

    <!-- Common setting for fetch (pull) and insert (push) operations. Also, copier process context uses it.
         They are overlaid by <settings_pull/> and <settings_push/> respectively. -->
    <settings>
        <connect_timeout>3</connect_timeout>
        <!-- Sync insert is set forcibly, leave it here just in case. -->
        <insert_distributed_sync>1</insert_distributed_sync>
    </settings>

    <tables>
        <!-- A table task, copies one table. -->
        <table_hits>
            <!-- Source cluster name (from <remote_servers/> section) and tables in it that should be copied -->
            <cluster_pull>source_cluster</cluster_pull>
            <database_pull>komrad_events</database_pull>
            <table_pull>events</table_pull>

            <!-- Destination cluster name and tables in which the data should be inserted -->
            <cluster_push>destination_cluster</cluster_push>
            <database_push>test</database_push>
            <table_push>events</table_push>

            <engine>ENGINE = MergeTree PARTITION BY toYYYYMM(FROM_UNIXTIME(key_time)) ORDER BY (key_time, tenant_id, collector_type, collector_id) TTL FROM_UNIXTIME(key_time) + toIntervalDay(30) SETTINGS index_granularity = 8192</engine>

            <sharding_key>jumpConsistentHash(intHash64(key_time), 2)</sharding_key>
        </table_hits>
    </tables>
</clickhouse>
client_config.xml
<clickhouse>
    <logger>
        <level>trace</level>
        <size>100M</size>
        <count>3</count>
    </logger>

    <zookeeper>
        <node index="1">
            <host>localhost</host>
            <port>9181</port>
        </node>
    </zookeeper>
</clickhouse>

Перед запуском команды на копирование необходимо запустить clickhouse-copier с помощью следующей команды:

clickhouse-keeper

Далее необходимо создать базу данных назначения (в примере базой данных назначения является test) с помощью нескольких команд:

clickhouse client -u komrad --password pass

create database test

После запуска ClickHouse-keeper и создания базы данных назначения можно запускать задание на копирование с помощью следующей команды:

clickhouse copier --config путь/до/client_config.xml --task-file путь/до/task.xml --task-path /home/astra

Для копирования событий на удаленный сервер необходимо в настройках удаленного сервера (/etc/clickhouse-server/config.yaml) открыть прослушивание адресов (раскомментировать строку):

# Listen specified address.
# Use :: (wildcard IPv6 address), if you want to accept connections both with IPv4 and IPv6 from everywhere.
# Notes:
# If you open connections from wildcard address, make sure that at least one of the following measures applied:
# - server is protected by firewall and not accessible from untrusted networks;
# - all users are restricted to subset of network addresses (see users.xml);
# - all users have strong passwords, only secure (TLS) interfaces are accessible, or connections are only made via TLS interfaces.
# - users without password have readonly access.
# See also: https://www.shodan.io/search?query=clickhouse
# listen_host: '::'

# Same for hosts without support for IPv6:
listen_host: 0.0.0.0

Далее в файле с заданием на копирование указать адрес удаленного получателя:

        <destination_cluster>
            <shard>
                <internal_replication>true</internal_replication>
                    <replica>
                        <host>адрес удаленного сервера</host>
                        <port>9000</port>

                        <user>komrad</user>
                        <password>pass</password>
                        <secure>0</secure>

                    </replica>
            </shard>
        </destination_cluster>

Последнее изменение 11.05.2023