Egor Litvinenko

BPMN и блокчейн

Egor — Wed, 20 Sep 2017 00:00:00 GMT

Сегодня большинство из вас слышало о blockchain, поэтому перейду сразу к делу. Я занимался разработкой BPMN процессов и сервера, обслуживающего их. BPMN (Business Process Model and Notation) - это стандарт графического представление элементов для описание бизнес-процессов (БП) в BPM (Business Process Management) - управление процессами. Традиционно BPM считается внутренней кухней организации. Другими словами большие организации создают БП для оптимизации процессов управления, которые используются внутри. Иногда БП используются между двумя организациями, когда есть большой уровень доверия. БП могут быть очень и очень большими, не помещаться в монитор, ватман и т.д., состоять из множества подпроцессов, таблиц принятия решений (Decision Tables) и многое другое. Вот пример простого бизнес процесса с Википедии:

И я задумался о том, что алгоритм blockchain мог бы найти применение в оптимизации бизнес процессов между различными организациями. В этом случае организации, участвующие в бизнес процессе, получают достоверный источник данных, который нельзя подделать; аналитики - возможность создавать БП знакомым способом.

Я вижу тут два применения:

Создание контрактов для блокчейн с помощью бизнес-процессов, как замена программированию;
Создание бизнес-процессов, в которых критические данные сохраняются в блокчейн (соответственно наследуя свойства безопасности и достоверности) и хранятся там для подтверждения выполнения процесса.

Собственно для вступления всё. Далее опишу сделанный мной proof of concept (доказательство концепции), который показывает, что это может быть реализовано. Для демонстрации я использую тестовую сеть Ethereum - https://rinkeby.etherscan.io. Это сеть аналогична главной, разница только в технических деталях алгоритма, вкратце, он проще. И сделана специально для разработчиков. В качестве основы для BPMN среды разработки я взял Camunda. Camunda одна из самых известных во всем мире создателей инструментов для BPM. Например, ею пользуются в http://www.zalando.com/. Я взял их исходные коды (у них открытая лицензия, не волнуйтесь, это легально) и сделал БП, запускаемой на Camunda, интегрированной с blockchain.

БП на blockchain выглядит так :

Хотелось бы предположить, что однажды все организации (и крупные, и средние), будут взаимодействовать подобным образом, чтобы снизить свои издержки и автоматизировать процессы, одновременно сделав их прозрачными.

Self learning applications

Egor — Fri, 11 Aug 2017 00:00:00 GMT

Status: Article in progress

Prehistory

What is your batch size? When I did tests for parsing csv and write to CH (now on Github), I saw that I have some variables: ring buffer size and batch size of Clickhouse driver, which I don’t know apriory, because ring buffer size is a CPU L3 cache size and batch size depends on many factors, even on my laptop it could be changed depends on logging settings. And I realized that we could have another familly of applications, which should define their parameters dynamically at runtime.

Think about it

Let’s imagine what does influence on batch size of application? If your database is remote, then it is at least network connection. If your database is local, then it is disk speed. What else? You also could mention branch predictions, cache coherence, false sharing and other effects reffered to multithreading processing, where batches decrease thread switches. What else? If you use compressing to reduce IO you should find batch size as a golden middle between CPU load and compressing. Or just row size. Really, when we set batch size we don’t think about that, generally. There are many effects and reasons. And all of them could be changed from x86 to x64 or PowerPC, or from node to node, or even on one node by different performance profile and load.

Batch size is only one variable. For instance, if we create parsing application, which read file → parse → write to database in multiple threads (typically two threads with pipe streams), there will new parameters – queue sizes for threads. So we could have a set of parameters, which influence on one big task.

So who know what should batch size equal? The only answer is runtime. Moreover, theoretically runtime could determine what batch size is best in particular scenarious (when we have different load on server, for instnace).

What we have?

In modern live it makes sense to suppose that someone already did it for us. I found this articles:

They talk about IoT and something close, but not the same. The difference is above articles are about collecting data from devices, and I’m talking about application. Maybe from some point of view is the same, because I also mean that we have some device (computer), where we run application and collect feedbacks. But the main difference is it’s not about machines only. People also could participate in this process. Also in practice it could be not about “big data”. For big data we need a lot of experiments, and our dynamic algorithms should be responsive enough and appropriate even we don’t have too much collected data. From the other side there are many runtime optimizations in JIT and HotSpot too, but this article about higher layer.

Algorithm

Let’s describe simple model, which will represent and solve the task of batch size (from the start).

Hello, world

Egor — Tue, 08 Aug 2017 00:00:00 GMT

public static void main(String... args) {
        System.out.println("Hello, world");
}

Speed table

Egor — Thu, 01 Jan 1970 00:00:00 GMT

Status - Just to know where to see quickly.

Table 1. Some speed examples
What	ns	µs
L1 cache reference	0.5	0.00005
Main memory reference	100	0.01
Compress 1K bytes w/ cheap algorithm	3,000	3
Compress 1K bytes with Zippy	3,000	3
Send 2K bytes over 1 Gbps network	20,000	20
SSD random read	150,000	150
Read 1 Mb sequentially from memory	250,000	250
Round trip within same datacenter	500,000	500
Read 1 Mb sequentially from network	10,000,000	10,000
Read 1 Mb sequentially from disk	30,000,000	30,000
Send packet CA → Netherlands → CA	150,000,000	150,000

Sources: