Review of digital technologies. Part 2

Content

Blockchain, smart contracts
Machine vision
Robotic Process Automation (RPA)
3D printing
Lakes and data warehouses
Computational storage (CS)

Second post in a series. The first part is available here

Blockchain, smart contracts

Many of you have heard this name and associate it primarily with Bitcoin and cryptocurrencies.

But this is only one special case of its application.

Blockchain is a method of storing and processing information in which all information is stored by all participants in the network and, with any changes, is also overwritten by all participants. And each new block of data is associated with the previous one.

https://interecnook.ru/vidy/uskoritel-tranzakcij-bitkoin.html

Why such complexity?

To guarantee authenticity. This approach eliminates the possibility of correcting information and making changes “retrospectively”.

This technology is needed if we do not trust the database administrator. Or to eliminate the risks of unfair execution of contracts between companies.

But this technology also has disadvantages. They are determined by the very principle of technology (with regards to massive blockchain networks) - millions of PCs process the same data:

1. This is a very energy inefficient story. Over time, 1 transaction will consume an obscene amount of energy.

2. This technology will require increasing amounts of memory for storage and processing power.

3. Low system performance + transaction complexity = limited application.

https://zen.yandex.ru/media/id/5c3e0247bf238900a9aa99fe/smartkontrakty-prosto-o-slojnom-5ee914bba3dca453cfdd2c42

Where blockchain technologies can be used:

Organization of voting and elections
Maintaining registers, for example, real estate. Public administration
Creation of smart contracts where it is necessary to eliminate the risks of litigation
Digital identity, authentication and proof of access rights
Copyright protection
Internet of Things
Casino, computer games
Exchange and trading management

Links:

Video:

Machine vision

Machine vision is another branch of computer vision.

If computer vision is a general set of techniques that allow computers to see, then machine vision is computer vision for manufacturing.

It uses a combination of digital video signal and neural network.

Machine vision can solve problems such as:

Recognition
Identification
Detection
Text recognising
Restoring 3D shape from 2D images
Motion estimation
Scene restoration
Image recovery
Identification of structures of a certain type in images, image segmentation
Optical Flow Analysis

Control of the production process and robot using machine vision https://www.invision-news.com/invision/3d-machine-vision-market-study/amp/

Applications of machine vision cover various areas of activity, for example:

Large industrial production
Accelerated production of unique products
Security systems, including industrial ones in factories
Control of pre-fabricated objects (e.g. quality control, error investigation)
Visual control and management systems (accounting, barcode reading)
Control of automated vehicles
Quality control and food inspection
Monitoring and preventing the development of emergency events.

One of the real examples of the use of machine vision is quality control at the Kamaz PJSC plant. The algorithm accurately, objectively and without fatigue determines geometric parameters.

More real-life examples can be found here

The limitations here, as usual, are price and people.

The price is gradually decreasing, but people continue to resist.

The prospects for the technology are excellent, since implementation is becoming increasingly cheaper, and the practical result can be seen and felt quickly; there are countless application scenarios.

Links:

Video:

Robotic Process Automation (RPA)

RPA (robotic process automation) is a form of business process automation technology using robots that can use a user interface to collect data and manage applications.

In traditional systems, a developer creates a list of actions to automate a task using programming interfaces ( APIs ) or a scripting language. RPA systems develop a list of actions by observing how the user performs that task in the application's graphical user interface.

For example, a robot can scan an email, understand what the request is about, prepare and send the necessary package of documents to the responsible employees.

https://cloudnetworks.ru/analitika/uspeh-robotizatsii/

What types of processes can be automated with RPA:

Repeatable, simple and standardizable actions
The process is carried out by many employees
A monotonous process for which instructions already exist
Relatively high standardization of incoming data
Possibility of autonomous execution

https://www.itweek.ru/idea/article/detail.php?ID=206102

What does this mean for business:

Reducing costs for routine operations
Fewer errors in processes, higher quality and speed of their implementation
Possibility of economical business scaling
Reducing business risks
Shifting the focus of employees to performing intellectual tasks

Below is an example of the effect of “relieving” an employee for 30 days on a specific business process, digitizing and entering PDF documents into the database

Well, a little more about the effects

https://terralink.ru/articles/upravlenie-biznes-kontentom/robot-vmesto-cheloveka-pochemu-biznesu-vazhno-vnedryat-rpa-/

And also about the most popular areas of activity for RPA implementation

https://nangs.org/news/technologies/pochti-polovina-kompaniy-v-moskve-vnedrila-sistemy-robotizatsii-biznes-protsessov

Robotization has 2 main competitors:

Manual labor.

If a process has a large number of branches and exceptions, or often requires human intellectual decision-making, it is better to leave this process manual.

Use classic business process automation.

Classic automation can win when automation of work in one system is required. However, when more than one system is involved in the process, robotization significantly outperforms classical automation.

Advantages of RPA over classical automation:

Ease of implementation: robotization of one process takes 2 months, and if robotization is put on stream, then this process can be reduced to 2 weeks. And there are examples where simple robots are developed in 3 working days. The proposed RPA approach is to work with a user interface and does not require high qualifications from the developer. It is enough for him to simply see how a specialized specialist works with several systems - and he will repeat this in a short time without diving into the specifics of the API. As for the fact that not all data is displayed in the user interface, the question arises: “Are they needed?” If today your employees combine information between several systems without access to hidden technical information, then they don’t need it.
Quick effect - 6 months before return on investment, ROI.
Robots require minimal changes to existing IT systems, since they work with a user interface
It is important to take into account the fact that when automating, each system has its own rules for working with APIs: there are subtleties with authorization, tokens, secret keys, and the order of calling several functions to achieve the desired result. And when you need to “make friends” two, three or even more systems, you can’t do without a really very expensive developer who will write and debug the integration between these systems for several months.

https://www.it.ru/services/detail.php?ID=13454

It is also generally accepted that there are 4 generations of RPA tools:

RPA 1.0 – Requires human intervention

Goal: Helps improve the productivity of a specific employee

How it works: in fact, the program itself is installed on the employee’s PC or laptop

Limitations: This is partial automation of manual operations, which is difficult to scale

RPA 2.0 – Does not require human intervention

Goal: complete automation of the entire process, emulation of human participation in the process (for example, a robot signs instead of a person)

How it works: a server is allocated on which a platform for organizing the work of robots is installed. There, the process and the roles of robots in the process are configured, that is, each robot is assigned a task, what it will do in the process. A process launch schedule is configured, and a set of analytical screens (dashboards) and reporting appears to monitor the effectiveness of the process.

Limitations: you need to manually ensure that all robots start and do not break. Manually make adjustments to their scripts and schedules. It's tedious, laborious and not very interesting.

RPA 3.0 – Autonomous Robot

Goal: eliminate human work in setting up and maintaining robots. The robot is watching the robot. You can already automate an entire workshop or department.

How it works: most often delivered as part of a comprehensive cloud solution with all the necessary infrastructure. An IT solution that does not require people and can independently record and analyze deviations.

Limitations: Still, the risk of breakdowns cannot be completely excluded. Not everything can be foreseen and analyzed in advance. Alas. For example, the arrival of data without structure and explicit format. It can be handwritten or loosely structured text. Or data from your browser about Internet searches. Data from the website on which you spend time is called “cookie”. An RPA 3.0 class system will not be able to recognize them.

RPA 4.0 – Cognitive (Smart) RPA

Goal: completely eliminate human influence. So that the robot learns and develops itself,

How it works: in fact, this is an improved version of RPA 3.0, but the main difference is that inside there is a neural network that imitates human thinking.

Limitations: Difficulties start with development and configuration. It is not enough to be able to analyze business processes. You need to understand mathematics and be able to build models. This is either time for training, or increased costs of attracting expensive specialists.

https://www.cfo-russia.ru/images/111/1/risunok_3.jpg

Main advantages of robotization:

1. Shifting the focus of employees to performing intellectual tasks

Employees can focus on more intelligent work that adds value, rather than on repetitive, routine tasks. By eliminating the need to perform repetitive, mechanical tasks, people have time to maximize their potential, eliminating waste in lean manufacturing terms. They are inspired by solving complex non-trivial problems. Labor productivity in such a team increases, and the company begins to engage in “useful” work.

2. Fewer errors in processes

Robots do not make mistakes - the possibility of error is completely excluded. A person can enter text and digital data incorrectly, but a robot always acts according to the specified instructions and never makes a typo.

3. Reducing internal costs for standard operations

A robot is a digital employee who can handle work that can be done according to instructions. It performs operations such as:

Click on the buttons.
Copy and recognize text.
Paste copied text into other systems and forms.
Build reports.
Perform actions in applications.
Work with scanned documents. Robots can use third-party text recognition engines to then work with text.
Work with databases and government. systems.
Send messages in instant messengers.
Perform other simple and routine operations.

In this case, the robot:

Performs operations 10–20 times faster than a human.
Open 24/7.
Helps business grow without increasing staff.

4. Business intelligence is more reliable and easily accessible.

Every transaction made using RPA is recorded in a log. Using this data, you can comprehensively analyze any completed processes.

5. Access to the person remains for non-standard requests.

If necessary, the robot can ask a person for help and wait for his answer.

6. Reducing the cost of entry into digitalization.

RPA algorithms allow you to automate interactions with legacy systems involved in a business process, which, in turn, eliminates the need to immediately replace automated systems or programs.

Restrictions:

1. To work, you need to digitize your processes and data so that there is as little information as possible on paper. It is necessary to work out the data structure.

2. You need to convince your people that this will not lead to layoffs, but to them doing more useful and interesting work that will allow the company to become more sustainable, which means their future will be more secure.

3. Since RPA simply copies user actions and interacts directly with the systems interface, the robot directly depends on the speed and stability of the target system.

Our opinion is that for small and medium-sized businesses this is one of the most promising technologies right now. He will be able to provide himself with the opportunity to grow for relatively little money, without the need to increase the cost of payroll, expanding the office, etc.

Materials:

Links:

Video:

3D printing

3D printing is a subtype of additive technologies.

Additive technologies are technologies for layer-by-layer growth and synthesis of objects.

That is, 3D printing has a lot in common with casting, sculpting, etc.

For a technology to be classified as "3D printing", the final product must be built from raw materials rather than blanks such as powder. And the formation of objects should be arbitrary - that is, without the use of forms. The latter means that additive manufacturing requires a software component.

What advantages does this technology provide:

Saving. Manufacture of unique parts without the need for complex reconfiguration or updating of equipment or changes in technology. That is, this is the customization of parts for each customer with virtually no restrictions on complexity. The only thing you need is to create a digital model
Speed. You don't have to worry about reconfiguring equipment.
Quality. Your product will be almost 100% likely to have the required dimensions and without hidden defects.

https://3dprintingindustry.com/?p=191176

https://080formacion.es/impresoras-3d-y-otros-encantos/

Areas of use:

For rapid prototyping, that is, the rapid production of prototypes of models and objects for further development. Already at the design stage, you can radically change the design of a node or an object as a whole. In engineering, this approach can significantly reduce costs in the production and development of new products.
For rapid production - the production of finished parts from materials supported by 3D printers. This is an excellent solution for small-scale production.
Production of models and molds for foundry production.
The transparent material design allows you to see the operation of the mechanism “from the inside,” which was, in particular, used by Porsche engineers when studying the oil flow in the car’s transmission during development.
Production of various small items at home.
Production of complex, massive, durable and inexpensive systems. For example, the Polecat [en] unmanned aircraft from Lockheed, most of the parts of which were manufactured using high-speed 3D printing.
Manufacturing of medicines, prosthetics and organs.
For the construction of buildings and structures
To create weapon components (Defense Distributed). There are experiments on printing entire weapons
Production of housings for experimental equipment (cars, telephones, electronic equipment)

Speaking about the advantages, we cannot help but mention the disadvantages:

relatively low accuracy 100-200 microns - the height of the printed layer varies. If you need products with very high tolerances, then this manufacturing method is not the best.
insufficient product geometry due to shrinkage problems;
Inability to remove internal support structures after printing after creating some geometries.
unevenness of the material (non-isotropy). Because the material is glued together layer by layer, the structure of the material differs from casting, the possibility of delamination arises, and the mechanical properties are reduced. The product becomes more fragile and, with the same geometric dimensions, parts made on a 3D printer will not be able to withstand the same load as parts made using traditional methods. This is the main disadvantage of this technology.
small selection of different materials. Although 3D printing already uses a fairly wide range of materials, there are still many more casting and modeling plastics, and their properties are in a wider range.
with a large series - high production costs. The mold is expensive, but if thousands of identical products are needed, then each one will be cheaper to produce by casting.
in mass production - slow production. Casting is much faster than 3D printing.

Our opinion is that this is a fairly niche technology, primarily for use in development work, or for the production of individual orders.

Links:

Video:

Lakes and data warehouses

We have reviewed a lot of technologies, and we already understand that the key criterion in digitalization and digital transformation is working with data, end-to-end analytics, assistance and support in decision making.

But we did not consider the most important thing - where should all the data be stored, how to work with it?

Conventionally, there are 2 approaches to storing and processing data - a data warehouse or a data lake.

Data store

A data warehouse is data aggregated from different sources into a single central repository that unifies it in quality and format.

Data scientists can use data from the warehouse in areas such as data mining , artificial intelligence (AI) , machine learning and, of course, business intelligence.

Data warehouses can be used in large cities to collect electronic transaction information from various departments, including data on speeding fines, excise duties, etc.

The repositories can also be used by developers to collect terabytes of data generated by automotive sensors. This will help them make the right decisions when developing autonomous driving technologies.

This technology uses an ETL approach to data processing. More details below.

Also, it is worth noting a special case of a data warehouse - a data mart.

https://myslide.ru/presentation/it-v-professionalnoj-deyatelnosti

https://data.korusconsulting.ru/press-center/blog/khranilishche-dannykh-kak-osnova-sozdaniya-korporativnoy-sistemy-biznes-analitiki/

Data mart

A data mart is a slice of data storage on a specific topic, intended for a specific circle of users in a company or its division.

A data mart can be used by a manufacturing company's marketing department to identify target audiences when developing marketing plans. It can also be used by the manufacturing department to analyze productivity and error rates to enable continuous process improvement. Data sets in a data mart are often used in real time for analytics and actionable insights.

Data lake

A data lake is a large repository of raw raw data, both unstructured and semi-structured. Data is collected from various sources and simply stored. They are not modified for a specific purpose or converted into any format. Analyzing this data requires extensive preliminary preparation, cleaning, and formatting to make it homogeneous. Data lakes are excellent resources for city governments and other organizations that store information related to infrastructure outages, traffic, crime, or demographics. The data can be used later to make changes to the budget or review the resources allocated to public utilities or emergency services.

The ELT (extract, load, transform) approach to data processing is used.

If you collect too much data “just like that” and do not work with it in any way, the lake can become a useless swamp. Therefore, it is important to determine in advance why exactly you are collecting data, and not just accumulate it, and periodically inventory and review it.

https://news.myseldon.com/ru/news/index/222183397

The key difference between data lakes and regular databases is their structure. Databases store only clearly structured data , while lakes store unstructured, unsystematized and disordered data .

Data Processing Approaches

3 key stages E, T, L:

Extraction (E – extraction) : Retrieving raw data from a pool of unstructured data and moving it to temporary intermediate data storage.
Transformation (T – transformation) : structuring, enriching and transforming raw data so that it matches the target source.
Loading (L – loading) : Loading structured data into a data warehouse for analysis and use by business intelligence (BI) tools.

ETL - In this process, ETL tool extracts data from different source systems, then transforms the data like applying calculations, concatenations, etc. and then loads the data into the data warehouse system.

In ETL, data flows from a source to a target. In ETL, the process transformation engine takes care of any data changes.

https://stepik.org/course/72726/syllabus

With ELT, once data extraction is complete, you immediately begin the loading phase—moving all data sources into a single, centralized data store.

Comparison of ETL and ELT according to 10 criteria:

1. Time - Loading

ETL : uses staging area and system, extra time to load data

ELT : everything in one system, download only once

2. Time - Transformation

ETL : you need to wait, especially for large amounts of data - as the data grows, the conversion time increases

ELT : all in one system, speed independent of data size

3. Time - Maintenance

ETL : high level of service – selection of data to load and transform; you need to do everything again if the data is deleted or you want to improve the main data store.

ELT : low operating costs – all data always available

4. Complexity of implementation

ETL : Requires less space at early stage and results will be clean

ELT : Requires in-depth knowledge of tools and expert design of underlying large storage.

5. Data warehouse support

ETL: The predominant legacy model used for local and relational structured data.

ELT: Adapted for use in scalable cloud infrastructure to support structured and unstructured big data sources.

6 . Data lake support

ETL : not part of the approach

ELT : Enables a lake with support for unstructured data

7 . Ease of use

ETL : Fixed Tables, Fixed Timeline, Used Mainly by IT

ELT : situational, flexible, accessible to everyone, from developer to civil integrator

8. Profitability

ETL : unprofitable for small and medium businesses

ELT : scalable and accessible for businesses of any size using online SaaS solutions

Final Thoughts on ETL and ELT

ETL is deprecated. It helped overcome the limitations of traditional rigid data center infrastructures, but today this is no longer an issue. In organizations with large data sets, multi-terabyte scale, loading times can take hours, depending on the complexity of the transformation rules.

ELT is an important part of the future of data warehousing. With ELT, companies of all sizes can benefit from modern technology. By analyzing large pools of data with greater flexibility and lower maintenance costs, companies gain key insights to create real competitive advantage in their business.

RESULTS

Data Lake stores all data regardless of the source and its structure, while Data Warehouse stores data in quantitative terms with its attributes.
Data Lake is a repository that stores huge amounts of structured, semi-structured and unstructured data, while Data Warehouse combines technologies and components that enable strategic use of data.
Data Lake defines the schema after storing the data, whereas Data Warehouse defines the schema before storing the data.
Data Lake uses the ELT (Extract - Load - Transform) process, while the Data Warehouse uses the ETL (Extract - Transform - Load) process.
Data Lake is ideal for those who want to conduct in-depth analysis, while Data Warehouse is ideal for current users.

For those who want to dive deeper into this area, we recommend taking this course. Below are notes from this course.

Our opinion is that data lakes are relevant if a company cannot or is too difficult to collect and transform data into a single format and there is a high degree of uncertainty in what you will look for and analyze, and what decisions will need to be made. If the company is industrial, everything is clear, definite, structured, then it is easier to work with a data warehouse.

For a data lake, constant work with data and revision of sources is also critical, otherwise this lake may turn into a swamp.

Well, for the most advanced, we can recommend creating a hybrid database - a lake and a storage. Technically this is quite feasible.

Links:

Video:

Computational storage (CS)

Compute storage is a relatively new technology designed to improve application performance and reduce the load on server hardware.

The essence of the technology is processing data directly where it is stored, for example on a computer’s hard drive. That is, compute and storage resources are combined to run data applications locally, which in turn reduces the load on the central processing unit or "host" computer, such as a data warehouse or remote server that can work on higher priority tasks. It also reduces the amount of data being moved. As a result, this allows you to reduce delays in data processing, relieve data transmission channels, increase security and reduce energy consumption.

How it works? Thanks to the development of microelectronics, it is now possible to place a small processor in each hard drive to process data directly on that drive. In a traditional computing system, the computing system wants to do some data processing and therefore requests data from the storage. The new scheme does not request disk data, but rather issues a command to perform a specific operation on the data on the hard disk itself. This is the future for cloud computing and data storage, data centers.

***

We develop our own digital solution for your projects. You can get acquainted with it at the link:

Digital Advisor