# Abstracts of Accepted Contributed Talks
Talk 01: Multiscale Reweighted Stochastic Embedding (MRSE): Deep Learning of Generalized Variables for Statistical Physics
Jakub Rydzewski
We present a new machine learning method called multiscale reweighted stochastic embedding (MRSE) [1] for automatically constructing generalized variables to represent and drive the sampling of statistical physics systems, particularly, enhanced sampling simulations. The technique automatically finds generalized variables by learning a low-dimensional embedding of the high-dimensional feature space to the latent space via a deep neural network. Our work builds upon the popular t-distributed stochastic neighbor embedding approach [2]. We introduce several new aspects to stochastic neighbor embedding algorithms that make MRSE especially suitable for enhanced sampling simulations: (1) a well-tempered landmark selection scheme; (2) a multiscale representation of the high-dimensional feature space; and (3) a reweighting procedure to account for biased training data. We show the performance of MRSE by applying it to several model systems.
[1] J. Rydzewski, and O. Valsson. J. Phys. Chem. A (2021) 125, 6286.
[2] L. Maaten, and G. Hinton. J. Mach. Learn. Res. (2008) 9, 2579.
Talk 02: Let's open the black box! Hessian-based toolbox for interpretable and reliable machines learning physics
Anna Dawid (University of Warsaw, ICFO), Patrick Huembeli (École polytechnique fédérale de Lausanne), Michał Tomza (University of Warsaw), Maciej Lewenstein (ICFO Barcelona), Alexandre Dauphin (ICFO Barcelona)
Identifying phase transitions is one of the key problems in quantum many-body physics. The challenge is the exponential growth of the complexity of quantum systems’ description with the number of studied particles, which quickly renders exact numerical analysis impossible. A promising alternative is to harness the power of machine learning (ML) methods designed to deal with large datasets [1]. However, ML models, and especially neural networks (NNs), are known for their black-box construction, i.e., they usually hinder any insight into the reasoning behind their predictions. As a result, if we apply ML to novel problems, neither we can fully trust their predictions (lack of reliability) nor learn what the ML model learned (lack of interpretability).
We present a set of Hessian-based methods (including influence functions) opening the black box of ML models, increasing their interpretability and reliability. We demonstrate how these methods can guide physicists in understanding patterns responsible for the phase transition. We also show that influence functions allow checking that the NN, trained to recognize known quantum phases, can predict new unknown ones. We present this power both for the numerically simulated data from the one-dimensional extended spinless Fermi-Hubbard model [2] and experimental topological data [3]. We also show how we can generate error bars for the NN’s predictions and check whether the NN predicts using extrapolation instead of extracting information from the training data [4]. The presented toolbox is entirely independent of the ML model’s architecture and is thus applicable to various physical problems.
[1] J. Carrasquilla. (2020). Machine learning for quantum matter, Advances in Physics: X, 5:1.
[2] A. Dawid et al. (2020). Phase detection with neural networks: interpreting the black box. New J. Phys. 22, 115001.
[3] N. Käming, A. Dawid, K. Kottmann et al. (2021). Unsupervised machine learning of topological phase transitions from experimental data. Mach. Learn.: Sci. Technol. 2, 035037.
[4] A. Dawid et al. (2021). Hessian-based toolbox for interpretable and reliable machine learning in physics. arXiv:2108.02154.
Talk 03: Beautiful Mind and RTB Auctions
Piotr Sankowski (MIM Solutions, IDEAS NCBR, University of Warsaw), Piotr Wygocki (MIM Solutions), Adam Witkowski (MIM Solutions), Michał Brzozowski (MIM Solutions)
RTB (real-time bidding) ad auctions are gaining in importance - in 2021, their global value will reach over $ 155 billion. Due to the fact that the advertiser purchases individual views targeted at specific users, he has much more control over the course of advertising campaigns and has a chance to match the advertisement to the user. Our company supported the creation of several systems operating on this market, and our clients include: RTB House, Spicy Mobile and HitDuck.
As part of my presentation, I would like to tell you about a fundamental change that has taken place on the RTB market in recent years, namely the transition from the second-price auction to the first-price auction. First, I will present what are the reasons for this change and why previously some brokerage firms found it profitable to cheat clients about their auction mechanisms. Secondly, I will explain what it means for us - strategic players and how to optimize our bidding algorithms in the case of the first price auction. I will tell you how the assumption of a symmetrical market in combination with Nash's equlibrium allows you to reduce advertising costs by over 40%.
Talk 04: Cell-counting in human embryo time-lapse monitoring
Piotr Wygocki (MIM Solutions, University of Warsaw), Michał Siennicki (MIM Solutions), Tomasz Gilewicz (MIM Solutions), Paweł Brysch (MIM Solutions)
Embryo visual analysis is a non-invasive method of selecting blastocysts for transfer after in vitro fertilization. Currently, it is mostly performed by embryologists. One of the prevalent biomarkers used for embryo selection is the so-called cell division times.
Thanks to the development of the automatic embryo monitoring systems we have obtained more than 600 clips of embryos in the first 5 days of incubation - from a single cell to a grown, ready to transfer blastocyst. Photos were taken every 7 minutes - about one thousand frames per embryo. The division times were manually tagged by embryologists.
We created an ML system which given an embryo time-lapse predicts its division times. In 91% of cases, its prediction is in the 3% error interval. Our model consists of two levels - 3D Conv Net that counts cells on short videos and second level, which based on the previous model’s predictions returns the division times for the whole clip.
Topic will be presented by Piotr Wygocki and Michał Siennicki from MIM Solutions.
Talk 05: Detection of faulty specimen made of Carbon fiber composites with deep learning and non-destructive testing techniques
Marek Sawicki (Wroclaw University of Science and Technology), Damian Pietrusiak (Wroclaw University of Science and Technology), Mariusz Ptak (Wroclaw University of Science and Technology)
In presented work authors showed capabilities of Deep Learning and Convolutional Neural Network for investigation of material defects in Carbon Fiber Reinforced Polymers (CFRP) with usage of Non Destructive Testing techniques (NDT). Specimen made out of CFRP with two holes within four random position were preliminary loaded at tensile test machine. Loading with 1000 cycles, 50 Hz and 2 kN amplitude were applied. Shortly after preliminary phase, static tensile test was performed. Static tensile test was conducted up to rupture with material out of thermodynamic balance. Disturbing factors such as lack thermodynamic balance and four random positions of holes scheme was used to estimate environment influence factors for real word applications. IR camera and regular photo camera was used for record field temperature and strain. Authors tested 30 specimens, which 1 specimen was excluded due to technical reasons, with 51 points of acquisition during static tensile phase. This gives around 1500 data sets obtained from experiment. By usage of augmentation number of datasets were multiplied by 4 which gave number of datasets around 6000. Thermography (IR imaging) was use for track temperature signatures of region which suffered irreversible thermodynamic changes. Those symptoms usually are related to local damage of material. Photo camera was used for record surface of specimen under static tensile test. Selected frames from recorded movie was used for stain field calculation with Digital Image Correlation (DIC) technique. Due to different principle of measurements advanced fitting of results to common format was applied. Common format was 100 x 50 x n, where 100 and 50 represent height and width, whereas n represent number of layers used for training. Those values was input for three CNN/DNN models. Raw values without pre-processing including features extraction had no potential in determination degree of material wear out. Filtering signal for feature extraction required extensive study of composites material mechanics. First model based on CNN was inspired by common CNN architectures such as VGG16, VGG19, ResNet etc. Second model based on DNN architecture with single vector input generated by Principal Component Analysis. This architecture reduced massively computation time with no significant differences in final model predictions. Third model consisted two separated input paths for separation DIC and IR related data from each other. In all cases classification problem was investigated. Model predict class as percentage range of averaged tensile curve. Output data was checked against true classes for acquisition points. In conclusion authors performed 3 blind tests with specimens excluded from training process to show that models are able to indicate specimen with internal defects from population of health specimens.
Talk 06: Neural network-based left ventricle geometry prediction from CMR images with application in biomechanics
Agnieszka Borowska (University of Glasgow), Lukasz Romaszko (University of Glasgow), Alan Lazarus (University of Glasgow), David Dalton (University of Glasgow), Colin Berry (British Heart Foundation and University of Glasgow), Xiaoyu Luo (University of Glasgow), Dirk Husmeier (University of Glasgow), Hao Gao (University of Glasgow)
Combining biomechanical modelling of left ventricular (LV) function and dysfunction with cardiac magnetic resonance (CMR) imaging has the potential to improve the prognosis of patient-specific cardiovascular disease risks. Biomechanical studies of LV function in three dimensions usually rely on a computerized representation of the LV geometry based on finite element discretization, which is essential for numerically simulating in vivo cardiac dynamics. Detailed knowledge of the LV geometry is also relevant for various other clinical applications, such as assessing the LV cavity volume and wall thickness. Accurately and automatically reconstructing personalized LV geometries from conventional CMR images with minimal manual intervention is still a challenging task, which is a pre-requisite for any subsequent automated biomechanical analysis. I this talk I will present a deep learning-based automatic pipeline for predicting the three-dimensional LV geometry directly from routinely-available CMR cine images, without the need to manually annotate the ventricular wall. The framework takes advantage of a low-dimensional representation of the high-dimensional LV geometry based on principal component analysis. I will discuss how the inference of myocardial passive stiffness is affected by using our automatically generated LV geometries instead of manually generated ones. These insights can be used to inform the development of statistical emulators of LV dynamics to avoid computationally expensive biomechanical simulations. The proposed framework enables accurate LV geometry reconstruction, outperforming previous approaches by delivering a reconstruction error 50% lower than reported in the literature. I will further demonstrate that for a nonlinear cardiac mechanics model, using our reconstructed LV geometries instead of manually extracted ones only moderately affects the inference of passive myocardial stiffness described by an anisotropic hyperelastic constitutive law. The developed methodological framework has the potential to make an important step towards personalized medicine by eliminating the need for time consuming and costly manual operations. In addition, the proposed method automatically maps the CMR scan into a low-dimensional representation of the LV geometry, which constitutes an important stepping stone towards the development of an LV geometry-heterogeneous emulator.
Talk 07: Synerise at KDD Cup 2021: Node Classification in massive heterogeneous graphs
Michał Daniluk (Synerise), Jacek Dąbrowski (Synerise), Barbara Rychalska (Synerise), Konrad Gołuchowski (Synerise)
We recently won three highly acclaimed international AI competitions. In this talk, I will present our solution to one of them - KDD Cup Challenge 2021. We achieved 3rd place in this competition - after Baidu and DeepMind teams, beating OPPO Research, Beijing University, Intel & KAUST, among many others. The competition task was to predict the subject of scientific publications on the basis of edges contained in the heterogeneous graph of papers, citations, authors, and scientific institutions. The graph of unprecedented size (~ 250 GB) contained 244,160,499 vertices of 3 types, connected by as many as 1,728,364,232 edges, which made it possible to verify the algorithms in terms of their readiness to operate on very large-scale data. We proposed an efficient model based on our previously introduced algorithms: EMDE and Cleora, on top of a simplistic feed-forward neural network.
Talk 08: IDEAS NCBR - about the new research center
Piotr Sankowski (IDEAS NCBR)
How many of you wanted to conduct research? And how many of you did not choose the scientific career path because of limited support academic system provides? I would like to present an option you may follow to meet your own research interests. I am more than happy to announce, that this year, a new scientific and research center - IDEAS NCBR was developed.
IDEAS NCBR is a research program in AI and digital economy. We believe that our center will become a discussion platform between academia and business. We aim to provide high-quality working environment by creating an interesting employment conditions and leading mentoring program. We support researchers to deliver the high quality R&D project by providing necessary tools and networking opportunities that can become a milestone in creating your scientific career, or testing your innovative idea. It will give you a chance to carry out an independent project, focus on solving problems and experience interesting challenges. Our mission is to become the biggest AI research center in Poland.
So far, we were able to form an incredible scientific board that consists of the best experts representing AI researchers and business environment. We also created two research groups led by prof. Piotr Sankowski, and prof. Stefan Dziembowski that will focus on blockchain and intelligent algorithms and learned data structures.
Would you like to learn more and together with us create the new lighthouse AI project in Poland?
Talk 09: Crisis Situation Assessment Using Different NLP Methods
Agnieszka Pluwak (SentiOne), Emilia Kacprzak (SentiOne), Michał Lew (SentiOne), Michał Stańczyk (SentiOne), Aleksander Obuchowski (Talkie.ai)
From the perspective of marketing studies, a crisis of reputation is interpreted as such post factum, when measurable financial loss is induced. However, it is highly demanded to discover its signs as early as possible for risk management purposes. Here is where artificial intelligence finds its application since the start of the Facebook era. Emotion- or sentiment-classification algorithms, based on BiLSTM neural networks or transformer architectures, achieve very good F1 scores. Nevertheless, the scholarly literature offers very few approaches to the detection of reputational crises ante-factum from an NLP point of view, while not every peak of mentions with negative sentiment equals a crisis of reputation by definition. There exist ample general sentiment classification tools dedicated to a specific social medium, like e.g. Twitter, while reputational crises often expand over various Internet sources. They also tend to be highly unpredictable in the way they appear and spread online and very few studies of their development have so far been conducted from the perspective of NLP tool-design.
Therefore, in our work we try to answer the question: how can we track reputational crises fast and precisely in multiple communication channels and what do current NLP methods offer with this respect?
For this purpose we have:
- consulted Internet monitoring experts and defined major crisis topics for three business domains,
- built and tested three different approaches to crisis detection (a HAN-based emotion detection model, heuristic crisis detection models for predefined risks, a statistical mention peak analysis tool with an ML-based summarization algorithm) to track most Internet sources and cover both explicit and implicit content,
- elaborated a scoring technique evaluating the company's' crisisiveness at a point in time.
We offer a comparative analysis of NLP tools of qualitative semantic methods applied to a study of real-life reputational crises that appeared in Poland within the last two decades.
Talk 10: Manipulating explainability and fairness in machine learning
Hubert Baniecki (MI2 DataLab, Warsaw University of Technology)
Are explainability methods black-box themselves?
As explanations and fairness methods became widely adopted in various machine learning applications, a crucial discussion was raised on their validity. Precise measures and evaluation approaches are required for a trustworthy adoption of explainable machine learning techniques. It should be as obvious as evaluating machine learning models, especially when working with various stakeholders.
A careless adoption of these methods becomes irresponsible. Historically, adversarial attacks exploited machine learning models in diverse ways; hence, defense mechanisms were proposed, e.g. using model explanations. Nowadays, ways of manipulating explainability and fairness in machine learning have become more evident. They might be used to achieve adversary, but more so to highlight the explanations' shortcomings and the need for evaluation.
In this talk, I will introduce the fundamental concepts fusing the domains of adversarial and explainable machine learning. Then summarise the most impactful methods and results that recently appeared in the literature. This talk is based on a live survey of related work, including over 30 relevant papers, available at https://github.com/hbaniecki/adversarial-explainable-ai.
Talk 11: LIDL: Local Intrinsic Dimension Estimation Using Approximate Likelihood
Piotr Tempczyk (University of Warsaw), Rafał Michaluk (University of Warsaw), Adam Goliński (University of Oxford), Przemysław Spurek (Jagiellonian University), Jacek Tabor (Jagiellonian University)
We investigate the problem of local intrinsic dimension (LID) estimation. LID of the data is the minimal number of coordinates which are necessary to describe the data point and its neighborhood without the significant information loss. Existing methods for LID estimation do not scale well to high dimensional data because they rely on estimating the LID based on nearest neighbors structure, which may cause problems due to the curse of dimensionality. We propose a new method for Local Intrinsic Dimension estimation using Likelihood (LIDL), which yields more accurate LID estimates thanks to the recent progress in likelihood estimation in high dimensions, such as normalizing flows (NF). We show our method yields more accurate estimates than previous state-of-the-art algorithms for LID estimation on standard benchmarks for this problem, and that unlike other methods, it scales well to problems with thousands of dimensions. We anticipate this new approach to open a way to accurate LID estimation for real-world, high dimensional datasets and expect it to improve further with advances in the NF literature
Workshop verision of paper and poster are available here: https://openreview.net/forum?id=ij1aPkDZBYV
Talk 12: Application of Machine Learning for multi-phase flow pattern recognition
Rafał A. Bachorz (PSI Polska Sp. z o.o.), Adam Karaszewski (PSI Polska Sp. z o.o.), Grzegorz Miebs (PSI Polska Sp. z o.o.)
Pipeline transportation safety requires the application of LDS (Leak Detection Systems). Reinforcing all pipeline lengths with continuous physical leak detection based on hydrocarbon sensing ropes, fiberoptic temperature or vibration detection is costly and not implemented for existing pipelines. Running scheduled checks of internal inspection gauges, camera-fitted drones or dog patrols is likely to miss or delay spill detection. On the other hand, the deployment of mature computational methods utilizing sparsely installed flow, pressure, or acoustic sensors provides a sufficient amount of process data allowing for robust, accurate, and precise leak detection and localization.
The work focuses on a particularly difficult case of pressure-based LDS for the multi-phase flow conditions. To maintain the accuracy of leak detection and location, also running estimation of reliability and robustness of LDS models it is important to monitor the profile of flow patterns over the pipeline paths. The hydraulic phenomena in the multi-phase flow are particularly complex, one of the features determining the fluid properties is flow pattern. Classically one distinguishes between Annular, Bubble, Dispersed bubble, Intermittent, Stratified smooth, and Stratified wavy patterns. These flow patterns are difficult to be determined by fundamental principles. A possible remedy here is to utilize empirical data in order to create the predictive models capable of assigning the flow pattern. The predictive strength of the resulting models was carefully checked and the Machine Learning Interpretability techniques have been employed in order to obtain a deeper understanding of the predictions. In particular, the prediction breakdown or the partial dependency plots were applied here. This, in the connections with the domain knowledge, brought an efficient tool for the Root Cause Analysis. All the Machine Learning Interpretability techniques were applied to understand the decisions proposed by the models. This allowed for building confidence and trust that the predictions are fair and based on clear presumptions.
Talk 13: Deep learning for decoding 3D hand translation based on ECoG signal
Maciej Śliwowski (Univ. Grenoble Alpes), Matthieu Martin (Univ. Grenoble Alpes), Antoine Souloumiac (Université Paris-Saclay), Pierre Blanchart (Université Paris-Saclay), Tetiana Aksenova (Univ. Grenoble Alpes)
Brain-computer interfaces (BCIs) may significantly improve tetraplegic patients' quality of life. BCIs create an alternative way of communication between humans and the environment and thus could potentially compensate for motor function loss. However, most current systems suffer from low decoding accuracy and are not easy to use for patients. In the case of invasive BCIs, electrocorticography-based (ECoG) systems can provide better signal characteristics, compared to electroencephalography (EEG), while being less invasive than intracortical recordings. Most studies predicting continuous hand translation trajectories from ECoG use linear models that may be too simple to analyze brain processes. Models based on deep learning (DL) proved efficient in many machine learning problems. Thus they emerge as a solution to create a robust and high level brain signals representation.
This work evaluated several DL-based models to predict 3D hand translation from ECoG time-frequency features. The data was recorded in a closed-loop experiment (""BCI and Tetraplegia"" clinical trial, clinicaltrials.gov NCT02550522) with a tetraplegic subject controlling movements of hands of a virtual avatar. We started the analysis with a multilayer perceptron taking vectorized features as input. Then, we proposed convolutional neural networks (CNN), which take matrix-organized inputs that approximate the spatial relationship between the electrodes. In addition, we investigated the usefulness of long short-term memory (LSTM) to analyze temporal information.
Results showed that CNN-based architectures performed better than the current state-of-the-art multilinear model on the analyzed ECoG dataset. The best architecture used a CNN-based model to analyze the spatial representation of time-frequency features followed by LSTM exploiting sequential character of the desired hand trajectory. Compared to the multilinear model, DL-based solutions increased average cosine similarity from 0.189 to 0.302 for the left hand and from 0.157 to 0.249 for the right hand.
This study showed that CNN and LSTM could improve ECoG signal decoding and increase the quality of interaction for ECoG-based BCI.
Talk 14: A Bayesian Nonparametrics View into Deep Representations
Michał Jamroż (AGH University of Science and Technology), Marcin Kurdziel (AGH University of Science and Technology)
The presentation/poster will concern a publication of the same title i.e "A Bayesian Nonparametrics View into Deep Representations" published on NeurIPS 2020 conference by M. Jamroż, M. Kurdziel and M. Opala. This work leverages Bayesian Nonparametrics methods to investigate the complexity of represenations learned by CNNs. It compares different network variants - including generalizing nets, memorizing nets and nets learned with regularization. The same concepts of complexity and complexity measure are also employed to investigate latent space in Variational Autoencoders (VAEs).
Talk 15: Time aspect in making an actionable prediction of a conversation breakdown
Piotr Janiszewski (Poznan University of Technology), Mateusz Lango (Poznan University of Technology), Jerzy Stefanowski (Poznan University of Technology)
Online harassment is an important problem of modern societies, usually mitigated by the manual work of website moderators, often supported by machine learning tools. The vast majority of previously developed methods enable only retrospective detection of online abuse, e.g., by automatic hate speech detection. Such methods fail to fully protect users as the potential harm related to the abuse has always to be inflicted. The recently proposed proactive approaches that allow detecting derailing online conversations can help the moderators to prevent conversation breakdown. However, they do not predict the time left to the breakdown, which hinders the practical possibility of prioritizing moderators’ works. In this work, we propose a new method based on deep neural networks that both predict the possibility of conversation breakdown and the time left to conversation derailment. We also introduce three specialized loss functions and propose appropriate metrics. The conducted experiments demonstrate that the method, besides providing additional valuable time information, also improves on the standard breakdown classification task with respect to the current state-of-the-art method.
Talk 16: Self-attention based encoder models for strong lens detection
Hareesh Thuruthipilly (National Centre of Nuclear Research, Poland), Adam Zadrożny (National Centre of Nuclear Research, Poland), Agnieszka Pollo (National Centre of Nuclear Research, Poland)
In the talk we would like to present an architecture for image processing based on transformers and inspired by the DETR architecture introduced by Facebook as a better alternative to regular CNNs and RNNs. We implemented our architecture on the Bologna Lens
Challenge (a mock data challenge for finding gravitational lensing), and we were able to surpass the best existing automated methods that participated in the challenge. Our proposed architecture requires a very low computational cost and has shown better stability than the CNNs participated in the challenge. In addition, the proposed architecture seems to be resistant to the problem of overfitting.
Talk 17: Generative models in continual learning
Kamil Deja (Warsaw University of Technology), Wojciech Masarczyk (Warsaw University of Technology), Paweł Wawrzyński (Warsaw University of Technology), Tomasz Trzciński (Warsaw University of Technology)
Neural networks suffer from catastrophic forgetting, defined as an abrupt performance loss on previously learned tasks when acquiring new knowledge. For instance, if a network previously trained for detecting virus infections is now retrained with data describing a recently discovered strain, the diagnostic precision for all previous ones drops significantly. To mitigate that, we can retrain the network on a joint dataset from scratch, yet it is often infeasible due to the size of the data or impractical when retraining requires more time than it takes to discover a new strain. The catastrophic forgetting severely limits the capabilities of contemporary neural networks, and continual learning aims to address this pitfall. Although current approaches to continual learning emphasize the sequential nature of learning new discriminative tasks, we argue that the main attribute of how humans learn new things is by discovering information structure without supervision. In this talk I will present several ideas of how we can use generative models and their latent data representations to address the problem of forgetting in neural networks.
Talk 18: LOGAN: Low-latency, On-device, GAN Vocoder
Dariusz Piotrowski (Amazon), Mikołaj Olszewski (Amazon), Arnaud Joly (Amazon)
In recent years neural networks have driven technological advancement across a magnitude of domains, including the TTS (Text To Speech) technology. Meanwhile, both economic and environmental costs of training and running huge neural networks have been growing rapidly; thus, creating the need for the optimisation of their efficiency. Making the models smaller and faster not only means reducing their economic and environmental impact but also allows to run them offline, directly on end-user’s devices. Bringing the inference offline helps protecting the user’s data, as it does not leave their devices and significantly reduces the latency by removing the time needed for cloud communication. Furthermore, it solves the problem of missing or intermittent connectivity.
A typical TTS system consists of two models - a context generator which creates the speech representation as mel-spectrograms from a phoneme sequence, and a vocoder which reconstructs the waveform. This work focuses on the latter. As a baseline we have chosen a state-of-the-art, GAN-based vocoder - MultiBand MelGAN. It can synthesise a high-quality audio with the complexity of 0.95 GFLOPS, which is fast enough to run in real time on high-powered embedded devices. However, it is not enough for real-world scenarios where there are multiple systems working on the device at once. What is more, the majority of the devices are lower-powered and the latency would be prohibitively too high, especially for longer utterances.
Models created with efficiency in mind are often orders of magnitude faster with minimal to none quality degradation. In this work we propose a number of methods, which can be also utilised across other domains to achieve a model suitable for real-world scenarios.
To decrease the latency, the inference is typically run in a streamable fashion, meaning the model is fed the input data in chunks and producing the output in corresponding chunks. Multi-Band MelGAN is a convolution-based model, which in contrast to traditional auto-regressive models is not streamable by default. If input were split into parts, fed into a convolutional network and then the outputs were to be concatenated - the result would be different, and possibly worse than if the whole input were to be fed into the network at once. Lack of streamable synthesis would cause the latency to grow linearly with the length of the target utterance. To address that issue we propose a method of running the inference in a streamable fashion, that is suitable for convolutional models, by carefully overlapping chunks of the model input and extracting parts of the computational graph, based on the receptive field of the model.
To further improve the performance of the model and reduce the latency we introduce quarter-precision quantisation and overcome the quality degradation it typically causes by using μ-law output together with pre-emphasis filtering and quantisation-aware training. Moreover, we increase the parallelisation of the computations by increasing the number of frequency sub-bands and optimise the architecture by reducing the number of filters used. As a result the proposed model achieves almost 44x higher RTF (Real Time Factor - audio length divided by inference time), reduces the latency by 78% and is almost 80% smaller, while decreasing the quality by only 1 point in relative MUSHRA score (methodology of evaluating the perceived quality of the output from lossy compression algorithms).
Talk 19: Foundations of Interpretable and Reliable Reinforcement Learning
Jacek Cyranka (University of Warsaw)
The idea is to showcase my recent research interests (work in progress), concentrated on research topic related to study of interpretable and reliable reinforcement learning algorithms with applications to robotics, space missions and computer games.
1. so-called state-planning policy method. I will present some preliminary results obtained in MuJoCo and SafetyGym simulation environments, I will be discussing several related prospective projects related to vision based state planning policy method and offline reinforcement learning.
2. Idea to develop a set of environments simulating space missions, i.e. deployment of a ship into space having gravitational interactions with planes, with aim of reaching fixed target position or enter an orbit with prescribed hard safety constraints (avoiding crashing into other objects having gravitational pull)
Talk 20: Law, Graphs and AI
Adam Zadrożny (National Centre for Nuclear Research, Poland)
Legal systems are one of the most complicated human made creations. It is hard to control it due its sparse and vast structure. For a human without completed legal studies it is hard to even get a good glimpse on a small part of it.
But let’s look at the law from a computer scientist's perspective. It is divided into logical parts Acts, Chapters, Articles, Points, … . What is more, there are references between relevant parts. So we have a hyperlink structure which allows us to present a legal system as a graph.
By using this and NLP methods we can efficiently visualise its structure and analyze interactions between its parts. But this leads us to the conclusion that we can use data mining and NLP to consistency checks, detect redundancy or detect involuntary changes due to hyperlink structure.
In the talk, I will explain how such analysis could be done for Polish and European Law. And how it might help to analyze what to clean up in the legal system, after many hurried messy changes were introduced during the COVID pandemic. Last but not least, how citizens might have a better control over what changes might be introduced by proposed Acts.
Talk 21: Color recognition should be simple - but isn’t - case study of player clustering in football.
Marcin Tuszyński (Respo.Vision), Łukasz Grad (MIMUW, Respo.Vision), Wojtek Rosiński (Respo.Vision)
Color recognition seems like a simple task, in fact there are over 500 repositories on GitHub and dozens of easy-to-find blog posts on this topic. While fairly common, it is not always that easy to use and adapt to complex environments.
The problem that we are dealing with is fully automatic and real-time player clustering in football. Our goal is to decide, during a particular game, to which team of the two a person belongs to. Our setting, which is information extraction from any broadcast-like match video, creates an extraordinarily difficult environment for machine learning models. Significant differences in camera angles, variable weather and light conditions, temporary occlusions and high movement dynamics are only some of the factors that need to be taken into account when creating a highly robust solution.
In our talk, we will give a short introduction to Gaussian Mixture Models (GMM) and how they can be used to solve the problem of player clustering. We will present our results using a GMM based on a color-based representation of each player, applied to both online and offline cases. We will show how different color models and player representations impact the clustering performance and the ability to detect non-player outliers. Next, we will focus on the shortcomings of the above-mentioned approach, especially during the model fitting in the beginning of the game, that ultimately led us to the development of alternative methods that do not rely on online estimation.
A wise man once said „If you don’t know what to do, use neural networks”. To our surprise, there are only a few publications on the topic of color recognition. We are going to walk you through our approaches and their architectures, beginning with why treating our problem as color classification may not be enough in some cases; how can colorspace regression be extended and customized and why does it overfit easily. Next, we describe how one can incorporate color as an auxiliary input in a neural network and why it provides unsatisfactory results. Finally, we will focus on how metric (color) learning can solve the player clustering problem and what to do when it (still) doesn’t want to.
Talk 22: From Big Data to Semantic Data in Industry 4.0
Szymon Bobek (Jagiellonian University), Grzegorz Nalepa (Jagiellonian University)
Advances in artificial intelligence, trigger transformations that make more and more companies entering Industry 4.0 era.
In many cases these transformations are gradual, and performed in bottom-up manner.
This means that in the first step, the industrial hardware is upgraded to collect as many data as possible, without actual planning of utilization of the information.
Secondly, the infrastructure for data storage and processing is prepared to keep the large volumes of historical data accessible for further analysis.
Only in the last step, methods for processing the data are developed to improve or get more insight into the industrial and business process.
Such a pipeline makes many companies face a problem with huge amount of data and incomplete understanding of how the data relates to the process that generates it.
In this talk we will present our works to improve this situation and bring more understanding to the data on the example of two industrial use-cases: coal mine and steel factory.
# Abstracts of Accepted Posters
Poster 01: Automated bias detection in polish media
Stanisław Bogdanowicz (University of Warsaw)
Amount of information spread everyday exceeds currently used possibilities of verification and analysis. Study presents an NLP and ML based approach to automated bias detection in Polish media. First part includes quantitative analysis of topic coverage by chosen media profiles on Twitter through keywords extraction, topic selection and grouping. Second part proposes an application of sentiment analysis models for bias/polarity detection. Research is based on tweets posted by verified profiles of 12 most opinion forming media in Poland from 01.07.2020 to 31.12.2020. Its aim is to develop automated and effective methodology for bias detection in Polish media.
Poster 02: MAIR: Framework for mining relationships between research articles, strategies, and regulations in the field of explainable AI
Stanisław Giziński (MI2 Data Lab), Michał Kuźba (MI2 Data Lab), Bartosz Pieliński (University of Warsaw), Julian Sienkiewicz (Warsaw University of Technology), Przemysław Biecek (MI2 Data Lab)
The growing number of AI applications, also for high-stake decisions, increases the interest in Explainable and Interpretable Machine Learning (XI-ML). This trend can be seen both in the increasing number of regulations and strategies for developing trustworthy AI and the growing number of scientific papers dedicated to this topic. To ensure the sustainable development of AI, it is essential to understand the dynamics of the impact of regulation on research papers as well as the impact of scientific discourse on AI-related policies. This paper introduces a novel framework for joint analysis of AI-related policy documents and eXplainable Artificial Intelligence (XAI) research papers. The collected documents are enriched with metadata and interconnections, using various NLP methods combined with a methodology inspired by Institutional Grammar. Based on the information extracted from collected documents, we showcase a series of analyses that help understand interactions, similarities, and differences between documents at different stages of institutionalization. To the best of our knowledge, this is the first work to use automatic language analysis tools to understand the dynamics between XI-ML methods and regulations. We believe that such a system contributes to better cooperation between XAI researchers and AI policymakers.
Poster 03: Application of Machine Learning Techniques to Recipe Processing and Analysis
Bartłomiej Gajda (Warsaw University of Technology), Michał Grzeszczyk (Warsaw University of Technology)
The Internet, cookbooks, notes and screenshots are abundant sources of inspiration when preparing a meal. However, it is often difficult to store all recipes in one place and browse them - recipe content cannot be searched easily and images take up too much memory space and are often blurry. Therefore, we present a deep learning method allowing a transformation of recipe images into a serialized, labelled text. Such a created text can be stored in the database for searching purposes and further analysis. Our model is based on Bidirectional Encoder Representations from Transformers (BERT) and classifies blocks of text extracted through Optical Character Recognition into 5 classes: title, description, ingredients, steps and miscellaneous text. It is trained on nearly 80 thousand recipes for which labels are retrieved by web scraping from multiple cooking websites and matched to the text extracted from web pages screenshots. The model achieves 95.04% accuracy on the testing set. Our approach to creating serialised recipe text allows for further recipe analysis. We use a Conditional Random Fields (CRFs) model to extract meta tags of recipe ingredients (quantity, unit, ingredient name etc.). From the meta tags and the recipe structure, the calorific value of the meal can be measured. The analysis of recipes in that manner can help people struggling with obesity or other food related problems find recipes and meals suitable for them. Our solution is integrated into the Chefs’ mobile application available on Google Play and Huawei App Gallery and proves the practical utility of the proposed method.
Keywords:
recipe, Optical Character Recognition, Bidirectional Encoder Representations from Transformers, Conditional Random Fields, calorific value
Poster 04: Factors determining return rates rates of return on shares of banks listed on the Warsaw Stock Exchange
Bartosz Szabłowski (Nestlé)
Models that predict stock movements are considered the golden grail of investing. Of course, nothing like this exists, but Michael Burry predicted the 2007 economic crisis based on data, while Jim Simons and Edward Thorp based their investments on statistics and machine learning. It has been statistically proven that you don't always have to be right, what matters is that you will be a tad better than the flip of a coin. Are machine learning models the prophet of today? Probably yes, although as the told Thorp mentioned the models will continue to work until no one else uses them. In my speech I will present two models, which use data from both fundamental and technical analysis and sector data. The first model predicts profit or loss for different time horizons, while the second model uses reinforcement learning to decide which stock to buy, sell or hold. I will present the idea of these two models and the advantages of machine learning models for time series (company stock price) in comparison with statistical models (ARIMA and similar). In addition, I will share the results of the models on a test set as well as in daily operation and present predictions for the future. The presentation will conclude with an analysis aimed at understanding the models, for example how the significance of features changes with further time horizons - I will answer the question of whether in practice the fundamental variables are responsible for long-term investments, while technical variables for shorter investments, how the agent's policy has changed over time and what to watch out for in practice in similar projects. I will try to answer all questions, stock market and Data Science are my passions.
Poster 05: Information Extraction from Historic Sources for the Costumology Purposes
Urszula Kuczma (University of Warsaw), Marcin Lewandowski (University of Warsaw)
Presented poster proposal regards an application of information extraction for Costumology research purposes (this methodology can also work for other, linguistically specific fields). Costumology specialists need to go through a large amount of literature to find proper descriptions of apparel. Sometimes they need to work even with short sentences, or just mentions. In our task we focused on automating this search for them, summarizing books into a set of fragments that describe apparel, with titles pointing out the key element. Contrary to the most up-to-date algorithms we did not want to create a neural network that would need to be pre-trained on similar texts, because it would be more time consuming and would require large computational power. Our aim was to provide a simple solution that due to its speed could be easily tweaked. We proposed to shift the focus from text itself towards creating a comprehensive dictionary that could be applied as a pattern to the Matcher component from spacy (the domain is narrow and specific, so the preparation of such a dictionary was managable and seemed relatively faster method). Text is tokenized to find dependencies, so that the keywords in sentences including their descriptions are flagged as the most important (primary search results). All the results without description are kept in a secondary list ( if the researcher would need to see all indices of an element in the text). After accomplishing the task we contacted a costumologist to give us an opinion about the relevance of the tool. As an example we used Jane Austen “Pride and Prejudice”. It was welcomed as a useful tool, with a remark that all the commonly used fragments were successfully flagged. Our proposal aimed to find a solution to a niche problem. Sometimes seemingly unrelated industries can gain a lot by using language processing. It is an important part of the learning process to recognize such needs. The project was created under the supervision of Dr Adam Zadrożny. It obtained 5! as a semestral project at the University of Warsaw, Master Degree in Cognitive Science. All involved people agreed to present the work at the conference.
Poster 06: IPC 2.0: prediction of isoelectric point and pKₐ dissociation constants
Lukasz P. Kozlowski (University of Warsaw)
IPC 2.0 – Isoelectric Point Calculator 2.0 is a web service and a standalone program for the estimation of protein and peptide isoelectric point (pI) and dissociation constant (pKa) values using a mixture of deep learning and support vector regression models. Isoelectric point, the pH at which a particular molecule carries no net electrical charge, is a critical parameter for many analytical biochemistry and proteomics techniques, especially for 2D gel electrophoresis (2D-PAGE), capillary isoelectric focusing (cIEF), X-ray crystallography, and liquid chromatography–mass spectrometry (LC-MS). According to the benchmarks, the prediction accuracy (RMSD) of IPC 2.0 for proteins and peptides outperforms previous algorithms: 0.848 versus 0.868 and 0.222 versus 0.405, respectively. Moreover, the IPC 2.0 prediction of pKa using sequence information alone was better than the prediction from structure-based methods (0.576 versus 0.826) and a few folds faster.
IPC 2.0 is available at http://www.ipc2-isoelectric-point.org
Reference: Kozlowski LP (2021) IPC 2.0: prediction of isoelectric point and pKa dissociation constants. Nucleic Acid Res. DOI: https://doi.org/10.1093/nar/gkab295
Poster 07: Guide through jungle of models! forester: An R package to automatically select between tree-based models
Anna Kozak (MI2DataLab, Warsaw University of Technology), Hoang Thien Ly (Warsaw University of Technology), Szymon Szmajdziński (Warsaw University of Technology), Przemysław Biecek (Warsaw University of Technology)
The broad range of machine learning (ML) applications has led to an ever-growing demand for ML systems that have high performance and easy to train and deploy. However, designing an ML model for a specific task is an arduous, time-consuming process. Some autoML solutions were developed to manage this problem, like for example, mlr3, caret, or H2O. In spite of that, the different syntaxes and requirements for a specific format of data in those tools may result in another daunting process of reading lengthy documentation.
To minimize all of the aforementioned drawbacks, and above all to create an open-source package for automating quick feedback loop for ML models, we created the {forester} package. It offers functions to automatically capsulize important steps in the ML pipeline: preprocessing data, feature engineering, creating model, hyperparameter optimization, and model evaluation. An extension of forester package is well connected with DALEX package, which provides an integration of model training and model explanations with the aim of increasing the credibility in deploying the best models. A major benefit of this package is the user-friendly interface standardizing the grammar of different known tree-based models, such as XGBoost, ranger, CatBoost, LightGBM, etc. in one unified formula. The source code and detailed description of our package are available at: https://github.com/ModelOriented/forester
Poster 08: An Imitation learning approach for the task of coronary artery centerline extraction from CT volumes
Bartłomiej Cupiał (LifeFlow), Tomasz Konopczyński (LifeFlow), Piotr Miłoś (University of Warsaw)
Intro & Motivation:
Coronary Artery Disease (CAD) is the most common type of heart disease which is the leading cause of death in Europe and the United States.
A computer tomography (CT) scan is one of the least invasive way to evaluate patients with suspected CAD. Displaying coronary arteries for diagnostic purposes usually requires Curved Planar Reformation (CPR) of coronary artery. Sine CPR depends on a precomputed centerline of the coronary artery, it is of great importance to have a technique which accurately extracts centerlines from the CT patient data.
Method:
In this work we investigate the use of imitation learning framework for the task of coronary artery centerline detection.
We propose a novel imitation learning setup utilizing CT scans with previously prepared centerlines as targets. We design, develop and discuss the environment, the agent, and the trainer.
Results:
On our test data the setup is capable of extracting centerlines with sensitivity 0.744, which results in finding the most clinically relevant parts of coronary arteries.
Poster 09: Optimising N-BEATS for the NVIDIA Time-Series Prediction Platform
Jolanta Mozyrska (University of Warsaw), Tomasz Cheda (University of Warsaw)
Recent advancements in deep learning have introduced new methods into the field of time series analysis and forecasting. N-BEATS (introduced at ICLR 2020) is the first pure deep learning model to outperform state-of-the-art statistical approaches on several widely used datasets, including M4 competition datasets.
As a part of our thesis, in partnership with NVIDIA, we connected N-BEATS to the Time-Series Prediction Platform - a newly released tool from NVIDIA enabling engineers to precisely compare time series models. We reproduced the results reported in the N-BEATS paper. Furthermore, we re-implemented N-BEATS – tuning and optimising its throughput and reported significant performance gains with no accuracy losses.
Poster 11: Artefact removal from 3D coronary artery segmentation from CT via point cloud neural network
Patryk Rygiel (LifeFlow), Tomasz Konopczyński (LifeFlow), Maciej Zięba (Tooploox, Wrocław University of Science and Technology)
Coronary artery disease (CAD), is one of the most common type of heart disease in the European Union and the United States. Therefore, it is important to perform computed tomography (CT) scans to help in diagnosis of a range of cardiovascular conditions such as CAD.
Distinction between coronary artery vessels and objects similar to them after segmentation is much needed for the automation of the diagnosis process.
While for such problems, usually a 3D volumetric based U-Net architecture is used, it is a well known fact that they are memory-intensive.
In this work we propose to utilize a point cloud neural network to process volumetric data, which is not only less memory-intensive, but also shows better results.
We present a method capable of distinguishing between coronary artery vessels and objects similar to them via utilization of point cloud neural network for the task of binary 3D object classification.
We show and discuss the improvement over a voxel-based solution.
Poster 10: Generative Gaussian Mixture Models for Data Imbalance
Iwo Naglik (Poznan University of Technology), Mateusz Lango (Poznan University of Technology)
Imbalance learning, sometimes called learning from long-tailed class distribution, occurs in many practical problems, including predictive maintenance, anomaly detection, medical image classification, etc. Among many proposed algorithms, the most popular ones are resampling methods that construct more balanced training sets by duplicating/removing instances or constructing new examples as linear interpolations of existing ones. Recently, a new no-free-lunch theorem for imbalanced learning indicated that finding a ""one-fits-all"" resampling method is impossible. Unfortunately, the currently proposed methods do not provide any guidelines on which type of data distribution they operate well.
In this work, we propose a new resampling technique called GGMM under- and oversampling that 1) assumes a concrete probabilistic model of data that provides a clear indication of the type of data preferred by the method, that can be detected on real data by e.g., statistical tests, 2) contrary to other methods constructs completely new training instances for minority classes that are not simple duplicates nor linear interpolations of existing instances, 3) combines the benefits of undersampling and oversampling techniques 4) leveraging knowledge of both theoretically and experimentally studied so-called data difficulty factors, strengthens large and difficult subconcepts of minority class and removes most harmful majority class instances.
The performed experimental evaluation on a handful of real datasets demonstrates that the method surpasses state-of-the-art methods like MDO, StaticSMOTE, SOUP, or CCR on G-mean and F-score.
Poster 24: Writing saves you time, build Zettelkasten!
Piotr Januszewski (Gdańsk University of Technology)
Once you write something down, you can remix and reuse the ideas for the rest of your life. As the number of artefacts increases linearly, the number of connections you make between them increases exponentially! At some point, publishing can become as enjoyable as building something new from the Lego pieces you already own. I will introduce you to a system of collecting and organizing your ideas and reading notes in a way that enable this proficiency - the Zettelkasten method.
Niklas Luhmann, a sociologist famous for his extensive use of the method, credited Zettelkasten for enabling his extraordinarily prolific writing. He published over 70 books and 400 scholarly articles in his working life. Luhmann built up a Zettelkasten of some 90,000 index cards for his research.
I will start with a short introduction to the method on my Zettelkasten example and then jump straight into the practical tips to implementing it in your research. I will show you how it combines with taking literature notes that you will actually use!
Poster 12: metaMIMIC: analysis of hyperparameter transferability for tabular data using MIMIC-IV database
Katarzyna Woźnica (Mi2DataLab, Warsaw University of Technology), Mateusz Grzyb (Warsaw University of Technology), Zuzanna Trafas (Poznan University of Technology), Przemysław Biecek (Warsaw University of Technology)
The performance of boosting algorithms is highly susceptible to selecting appropriate hyperparameter values, which is computationally expensive and time-consuming. Hyperparameter transfer is one of the answers to the strong demand for more efficient methods of algorithms tuning. It is proven that using model performance results for unrelated tasks allows faster optimisation compared to traditional approaches, but few attempts have been made to answer what affects the transfer capability. To fill this gap, we create a benchmark of medical domain machine learning tasks based on the MIMIC-IV database and consider a few scenarios of different structural task resemblance, reflecting real-life use-cases. Results suggest that structural similarity enhances the transferability of hyperparameters. It speeds up the tuning of machine learning algorithms for a specific prediction problem. From a practical application perspective, conclusions support the hypothesis about the benefits of building the domain meta-data repository.
Poster 13: Estimating the Secondary Attack Rate of COVID-19 using Twitter data [UNPUBLISHED RESEARCH]
Tomasz Czernuszenko (University College London), Ingemar Cox (University College London), Vasileios Lampos (University College London), Elad Yom-Tov (Microsoft Israel)
[UNPUBLISHED RESEARCH]
The COVID-19 pandemic has exposed numerous difficulties in objectively assessing the extent of epidemics, even in developed countries. The familial Secondary Attack Rate (fSAR), the probability that a contagious person infects a household member, is a key parameter to disease models that inform public health policies. Unlike traditional methods to estimate SAR and fSAR which rely on recruitment of a cohort, estimating fSAR indirectly via Twitter posts could lower the time and cost of measuring fSAR, provide estimates for geographically finer regions, and do so at regular time intervals, e.g. weekly or monthly.
Our work uses content from Twitter to determine whether a user or members of their family or household were infected with COVID-19. Among users who report being ill with COVID, we also search for mentions of whether their family members also contracted the disease; the fSAR can be inferred from the ratio of users with COVID to those whose family members have also contracted the virus. Early results for the UK using a preliminary approach show that fSAR ranges from 0.04 in April 2020 to 0.55 in April 2021, which is consistent with the fSAR estimates (0.04-0.53) obtained using traditional methods internationally, though further calibration is necessary to align with values reported by Public Health England. Currently, we are improving various aspects of the methodology, including text-based classifiers, and the probabilistic framework for estimating fSAR from Twitter activity. We also plan to evaluate our method in other countries (e.g. the United States).
Poster 14: Sobriety detection by voice analysis using machine learning methods
Mikołaj Najda (AGH University of Science and Technology), Jakub Żółkoś (Boston University), Daria Hemmerling (AGH University of Science and Technology)
In 2020 Polish road users intoxicated with alcohol participated in 2540 fatal accidents (13.1% of all cases), and 2723 accidents that resulted in injuries (10.3%). According to the data collected by the NHTSA, in 2019 in the United States, 10142 people died as a result of alcohol-impaired driving. Our idea is to make a voice-based detecting system to prevent the driver from driving any vehicle if his body has any alcohol concentration. The voice samples were obtained from volunteers in a sober state and after alcohol consumption, where the concentration of alcohol in the exhaled air was greater than 0.05‰. Each participant had to describe the presented image for 20 seconds. The signals preprocessing included volume normalization, VAD and signals parametrization to achieve informative features that could be used for further classification. Quantitative acoustic vocal assessment of five distinct speech dimensions related to phonation, prosody, non-linear features, and speech timing was performed. All candidates were evaluated using a breathalyser device. The random forest, xgboost and catboost were applied. The system’s performance was tested by accuracy and F1 score metrics. The best results were achieved for the random forest classifier, which is: almost 70% of accuracy.
Even a small amount of alcohol present in the cardiovascular system substantially decreases a person’s reflexes and the ability to act and think rationally. The globally available tool to check, whether the person has any alcohol concentration will protect many people from drunk drivers.
Poster 25: Analysis of sentiment of COVID19-related articles from two of the biggest polish media news websites: TVN24 and TVP Info
Jędrzej Miecznikowski (Uniwersity of Warsaw), Wiktoria Dołębska (University of Warsaw)
In this project, we examined the sentiment of over 4500 COVID-19-related articles coming from the two media news websites in the time period from July 2020 to June 2021. We wanted to see whether and how they differ within one medium over time, as well as examine potential differences between the two media. In order to get more specific results, based on our initial intuitions, the articles were categorized in terms of whether they were dealing with national or world-wide affairs. Additionally, we used several different methods available for assessing Polish sentiment and compared them. We found significant differences between sentiment assessment methods, but our hypotheses about differences between outlets, both in time and by category, are not confirmed. We propose next steps for sentiment assessment of Polish newspaper articles and highlight the importance of such research.
Poster 27: Mbaza - machine learning used in wildlife protection. Case study
Jędrzej Świeżewski (Appsilon)
Mbaza is an open source application using machine learning to aid wildlife protection. It is currently being used by park rangers and bioconservationists in Gabon and tested in other countries.
During the presentation I will highlight some of the challenges we needed to overcome when creating Mbaza, with a particular focus on building the machine learning model, but also on the pecularities of building an application which needs to operate in the wild (the second largest tropical rainforest in the World).
The presentation will involve magnificent photos of wildlife taken in Central Africa and used for the training and evaluation of our models.
Poster 28: Scalable Intent Classification: Embeddings vs. Classifier
Helena Skowrońska (Yosh.AI), Arleta Staszuk (Yosh.AI), Paweł Wnuk (Yosh.AI)
The task of multi-class text classification finds a practical application in FAQ chatbots, among others. However, training a classifier usually requires large labeled datasets of multiple life-like user queries for each FAQ intent. Furthermore, such classifiers need to be retrained for each update to the set of the supported questions. Our proposed solution consists in a strategy for scalable FAQ intent classification, inspired by the recent developments in few-shot learning.
Our FAQ chatbot strategy is to combine an SVM intent classifier with cosine similarity between sentence embeddings. The embeddings are calculated for each question in our training dataset: about 55 k questions classified into about 80 intents, for each of our two languages: English and Polish. To select the best embedding model, we have conducted experiments with multiple sentence-transformers models (from the Hugging Face repository). At this point, classifying a user query consists simply of calculating the embedding of the query, and finding the question that resembles it the most in our dataset, by calculating cosine similarities between the embeddings.
This strategy is highly scalable, because it does not require any retraining. Adding or removing intents involves calculating the embeddings for the new intents or removing those of the unwanted ones, without any change to the already existing embeddings. What is more, our approach does not even require defining the intents – it is enough to have ordinary FAQ content.
Classification results (micro F1) obtained on a separate test set:
Polish
Embeddings only: 0.81
SVM: 0.85
SVM with embeddings: 0.87
English
Embeddings only: 0.80
SVM: 0.86
SVM with embeddings: 0.87
In our production FAQ solution, we combine the two approaches for best results and scalability. The SVM (with embeddings) supports the common core of the intents for retail (delivery, payment, etc.), while embeddings with similarity matching covers intents that are unique for each retailer.
Poster 15: Fact-checking must go on!
Albert Sawczyn (Politechnika Wrocławska), Denis Janiak (Politechnika Wrocławska), Jakub Binkowski (Politechnika Wrocławska), Łukasz Augustyniak (Politechnika Wrocławska), Tomasz Kajdanowicz (Politechnika Wrocławska)
In recent years, an overwhelming problem of falsehoods dissemination has raised concerns of many researchers. The negative effects of this phenomena could be observed in the most critical areas of human life, like politics or healthcare. The widespread response to this issue comes from fact-checking organisations, which are responsible for manual verification of information occurring in the media. However, their work is labour-intensive, leading to high costs and low efficiency. Moreover, developing automation of this process is challenging, especially for low resource languages, and hence remains unsolved. Therefore, in this work, the study was conducted with an aim to examine the possibility of obtaining methods assisting fact-checkers in their central tasks: relevance assessment and veracity classification of textual information. To increase the research efforts in less popular languages, the methods were developed for Polish and relied on the own-curated dataset. The conducted study is one of the first ones for Polish, providing the methods and discussion on the difficulties and challenges in automated fact-checking.
Poster 16: Deep Neural Network Approach to Predict Properties of Drugs and Drug-Like Molecules
Magdalena Wiercioch (Jagiellonian University), Johannes Kirchmair (University of Vienna)
The discovery of small molecules with desirable properties is an essential issue in chemistry which could speed up much research progress in various domains such as virtual screening and drug design. Indeed, there is a series of open challenges,
including building proper representations of molecules for machine learning algorithms.
To address this issue, in this study we propose a deep neural network-based architecture that learns molecular representation to enhance the process of molecular properties prediction. We use two separate blocks of operations, where each block learns a representation. The first block models a molecule as an undirected graph. The core part is a stack of attention layers followed by a fully connected layer applied to form features from molecular graphs. The second block converts a molecule into molecular fingerprints. Then a deep learning algorithm is adopted on the sequence to learn a representation. The two final feature vectors are concatenated and fed into fully connected layers. The final layer is a regression or classification layer to estimate the output as the property value.
A major advantage of our approach, as opposed to the previous methods lies in that our well-designed architecture applies a stacked attention mechanism and incorporates both the atom and molecule level attributes. Extensive experiments are provided on six datasets included in the publicly available benchmark dataset MoleculeNet. Our approach significantly outperforms the state-of-the-art methods on both classification and regression tasks.
Poster 17: Explainable AI for Photon Science Data Reduction
Barbara Klaudel (Deutsches Elektronen-Synchrotron), Shah Nawaz (Deutsches Elektronen-Synchrotron), Vahid Rahmani (Deutsches Elektronen-Synchrotron), David Pennicard (Deutsches Elektronen-Synchrotron), Heinz Graafsma (Deutsches Elektronen-Synchrotron)
Our notion of life derives from our understanding of the molecular structure of the matter. The molecular structure models are obtained with X-ray diffraction experiments. However, these experiments produce data at a very high rate, rapidly exceeding storage capacity, and only 5-10% of this data is useful for further analysis. Such a problem requires implementing a processing pipeline for filtering unusable images. Only recently, deep learning has been proposed as a means of achieving it. Yet, the majority of studies do not address the limitation of a black box model and do not justify their choice of architecture. Image selection has to take place in real time, so a model needs to have a good tradeoff between high specificity (to prevent information loss) and high performance (to analyse images in real time).
We present the results of an analysis of models with different architectures, wherein accuracy, inference time, depth of a model, and key characteristics of failure cases were evaluated. We give detailed layer-wise analysis of different models assessing their contribution to final prediction. Moreover, we employed explainable AI methods, such as Grad-CAM and Inverting Visual Representations, in order to create a guideline for the evaluation of deep learning models in serial crystallography. Our study demonstrates that the usage of a shallower architecture can yield as good results as the deeper ones and accounts for shorter inference time.
Poster 30: A/B testing with challenging KPIs. Going beyond standard tests.
Armin Reinert (Empik Group), Kaja Marmołowska (Empik Group)
Relatively often A/B tests setups relay on normally
approximated binomial tests and standard t-tests for both power analysis
and impact significance testing. Being sufficient for cases
of key performance indicators (KPIs) with distributions following these
tests’ assumptions such approaches may prove inefficient for more
challenging ones. As result we may use overinflated sample sizes or
arrive to dubious statistical conclusions. Such issues
are worrisome when costs of single treatment are high or repercussions
of misjudged rollout for business can be significant. Based on our
practical experiences at Empik we discuss using corrections to adjust
for log-normal zero-inflated and sparse binomial
distributed KPIs.
Poster 18: DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders
Piotr Bródka (University of Trento, Clebre), Nicola Garau (University of Trento), Niccolo Bisagno (University of Trento), Nicola Conci (University of Trento)
Human pose estimation (HPE) is the task of localizing parts of the human body (joints) on images and videos. This field can have many practical applications, like for example surveillance systems (detecting unusual patterns of behaviour).
Despite the advances in the field, there are still some major issues that need to be addressed. Datasets usually provide side-view images and many methods in the literature usually focus on side-view images. But other viewpoints - like top-view also need to be addressed, because there are practical applications for such circumstances, like monitoring of people from the top (because of too many occlusions when monitoring from side). Generalization with respect to viewpoints unseen at training time is a big challenge. And it is a topic of the paper.
Viewpoint equivariance is a property of the system that provides a possibility to capture a transformation made to the image and thus - generalize to unseen viewpoints. In this paper we leverage Capsule networks (CapsNets) to this task (to achieve viewpoint equivariance). Their way of working can be compared to mechanics of our visual cortex system.
Viewpoint transfer is a task when we train on a dataset with a certain viewpoint and test on a dataset with another viewpoint. No previous viewpoint transfer results for the RGB domain have been presented in literature. We have made such experiments, and the results are satisfying.
During the talk I will explain the concepts presented above.
The paper can be found here: https://arxiv.org/pdf/2108.08557.pdf
Poster 19: Bayesian inference of cell type proportions in mouse brain and prostate cancer data
Agnieszka Geras (MiNI PW), Shadi Darvish Shafighi (MIMUW), Łukasz Rączkowski (MIMUW), Igor Filipiuk (MIMUW), Łukasz Rączkowski (MIMUW), Hosein Toosi (KTH Royal Institute of Technology), Leszek Kaczmarek (Nencki Institute of Experimental Biology of the Polish
Academy of Sciences), Łukasz Koperski (Medical University of Warsaw), Jens Lagergren (KTH Royal Institute of Technology), Dominika Nowis (Medical University of Warsaw), Ewa Szczurek (MIMUW)
Various technologies measuring gene expression level, such as bulk sequencing, single cell RNA-seq (scRNA-seq) and spatial transcriptomics have been developed to enable to gain an insight into cells' and tissues' functionality in normal and abnormal conditions. The spatial transcriptomics (ST) maps gene activity spatially, since unlike scRNA-seq experiments, it retains information on cells' position within the tissue. However, ST spots contain multiple cells, therefore the observed signal inevitably conveys information about mixtures of cells of different types. Numerous methods for cell type deconvolution in ST data have already been proposed. Unfavourably, these methods require the two types of data (ST and scRNA-seq) and may be prone to bias due to platform-specific effects, such as sequencing depth. To address those issues, we present an innovative approach that does not require single cell data, but instead needs additional prior knowledge on marker genes. Our novel probabilistic model, called Celloscope, utilises probabilistic graphical models. Its parameters are inferred using the Markov chain Monte Carlo (MCMC) approach, more specifically, the Gibbs-within Metropolis Hastings.
Celloscope was applied on data on the sagittal part of the mouse brain. It was able to successfully indicate known brain structures and spatially distinguish between two main neuron types: inhibitory and excitatory. We also investigate the immune contexture of the tumour microenvironment in prostate cancer, what is a vastly important task, as cancer progression and subsequent potential success of the treatment are greatly influenced by the composition of the immune cells in the heterogeneous microenvironment of the tumour.
Poster 20: An efficient manifold density estimator for all recommendation systems
Adam Jakubowski (Synerise.com), Jacek Dąbrowski (Synerise.com), Barbara Rychalska (Synerise.com), Michał Daniluk (Synerise.com), Dominika Basaj (Synerise.com)
ost current neural recommender systems for session-based data cast recommendations as a sequential or graph traversal problem, applying recurrent networks (LSTM/GRU) or graph neural networks(GNN).
This makes the systems increasingly elaborate in order to model complex user/item connection networks and results in poor scalability tolarge item spaces and long item view/click sequences.
Instead of focusing on the sequential nature of session-based recommendation, we propose to cast it as a density estimation problem on item sets.
We introduce EMDE (Efficient Manifold Density Estimator) - a method utilizing arbi-trary vector representations with the property of local similarity to succinctly represent smooth probability densities on Riemannian manifolds using compressed representations we call sketches.
Within EMDE, session behaviors are represented with weighted item sets, largely simplifying the sequential aspect of the problem.
Applying EMDE to both top-k and session-based recommendation settings, we establish new state-of-the-art results on multiple open datasets in both unimodal and multi-modal settings.
EMDE has also been applied to many other tasks and areas in top machine learning competitions involving recommendationsand graph processing, taking the podium in KDD Cup 2021, WSDM Challenge 2021, and SIGIR eCom Challenge 2020.
We release the code at https://github.com/emde-conf/emde-session-rec
Poster 34: A short dive into active learning and various informativeness measures.
Antoni Jamiołkowski (QED Software)
During the talk I would like to present the problem of active learning and its relevance to machine learning research, explain common uncertainty sampling methods and share our findings about model-agnostic uncertainty measures.
In particular, ""Label-in-the-loop"" (""LitL"" for short) is a human-in-the-loop system that merges a sample selection, experts tagging, consensus voting and a model training into a consistent and efficient ML loop. While developing ""LitL'' we have encountered many interesting problems. One such problem is the generality of common uncertainty sampling methods. Is the max-entropy an ideal model agnostic uncertainty measure? In our research, we experimented with various uncertainty measures, different models, and different datasets. During my presentation I would like to summarize those results and share our most important insights.
Poster 35: What we can learn from a data mining competition results? A case study at KnowledgePit.
Maciej Matraszek (QED Software, University of Warsaw), Andrzej Janusz (QED Software, University of Warsaw)
KnowledgePit.ml is a Polish data mining challenge platform, where researchers from around the world compete to solve real-life problems. In the talk, I will describe the recent competition on predicting the winner of the Tactical Troops: Anthracite Shift video game. I am going to discuss the results and present distinct approaches taken by the contestants. Moreover, I will aim to demonstrate how XAI techniques can be used to diagnose the solutions - even without direct access to models that were used to create them - which could have a paramount importance to the competition sponsors when choosing the solution to be implemented in their system.
Poster 21: Universal Neural Vocoding and improving its expressiveness with non-affine Normalizing Flows
Adam Gabryś (Alexa AI), Yunlong Jiao (Alexa AI), Viacheslav Klimkov (Alexa AI), Daniel Korzekwa (Alexa AI)
We present a general enhancement to the Normalizing Flows (NF) used in neural vocoding. As a case study, we improve expressive speech vocoding with a revamped Parallel Wavenet (PW). Specifically, we propose to extend the affine transformation of PW to the more expressive invertible non-affine function. The greater expressiveness of the improved PW leads to better-perceived signal quality and naturalness in the waveform reconstruction and text-to-speech (TTS) tasks. We evaluate the model across different speaking styles on a multi-speaker, multi-lingual dataset. In the waveform reconstruction task, the proposed model closes the naturalness and signal quality gap from the original PW to recordings by 10%, and from other state-of-the-art neural vocoding systems by more than 60%. We also demonstrate improvements in objective metrics on the evaluation test set with L2 Spectral Distance and Cross-Entropy reduced by 3% and 6‰ comparing to the affine PW. Furthermore, we extend the probability density distillation procedure proposed by the original PW paper, so that it works with any non-affine invertible and differentiable function. The work is presented in the context of universal neural vocoding following our recent work on Parallel WaveNet, with an additional conditioning network called Audio Encoder offering real-time high-quality speech synthesis on a wide range of use cases. Specifically tested on 43 internal speakers of diverse age and gender, speaking 20 languages in 17 unique styles.
Poster 36: The Session-Based kNN implementation in TensorFlow
Szymon Moliński (Sales Intelligence)
The shared data structure in the e-commerce sector is a sequence of user actions named session. Sessions have different lengths and for most cases, they come in a large volume within the dynamic product (item) environment. The Deep Learning approaches utilize complex recurrent neural network architectures to predict the following user action or the next item in the sequence. However, all complex and easily overfitted models should be compared to the baseline, which is a set of algorithms of slightly worse performance but great availability in the laboratory environment. The baseline session-based recommender algorithms are created from the k-NN, Markov Models and association rules. Unfortunately, this availability is not transferred into a production environment and software engineers should rewrite the baseline algorithms from scratch to deal with data in production.
Nowadays, we have a few production pipelines to choose from and one of these is TensorFlow Extended (TFX) system. TFX supports TensorFlow functions and models (largely the neural networks) and TensorFlow itself supports calculations over graph structure. It works well with the implementation of the neural network models but the implementation of the memory-based algorithms as k-NN is not straightforward. The additional layer of complexity comes with the session implementation within the k-NN and the item map, used for the recommendation.
The development of the memory-based model within TensorFlow is complex. The converted model can be huge because it stores sessions and items within the memory. Why to bother in this case? The first reason is to have a reference for the neural networks, the second reason is that in our study, k-NN based model performance was very close to the fine-tuned neural network architectures and at the same time model has a less number of parameters. The last reason is the volume of data: not all models based on the neural network architectures are well trained and the important factor is data volume. For the relatively smaller volume of sessions, it is better to use simple models to prevent overfitting.
In this presentation, we will show the Sk-NN model implementation in TensorFlow and TensorFlow Extended, compare the Sk-NN model to the neural networks with the processing time vs session set size and with the MRR vs session set size.
Poster 40: Metric Learning for Session-based Recommendations
Szymon Zaborowski (Sales Intelligence Sp. z o.o.), Bartłomiej Twardowski (Warsaw University of Technology), Paweł Zawistowski (Warsaw University of Technology)
Session-based recommenders, used for making predictions out of users' uninterrupted sequences of actions, are attractive for many applications. Here, for this task we propose using metric learning, where a common embedding space for sessions and items is created, and distance measures dissimilarity between the provided sequence of users' events and the next action. We discuss and compare metric learning approaches to commonly used learning-to-rank methods, where some synergies exist. We propose a simple architecture for problem analysis and demonstrate that neither extensively big nor deep architectures are necessary in order to outperform existing methods.