Research Projects

KGLiDS: a Linked Data Science Platform

KGLiDS is a platform for constructing a knowledge graph for linked data science. We employ machine learning to extract the semantics of data science pipelines and capture them in a knowledge graph, which can then be exploited to assist data scientists in various ways. This abstraction is the key to enabling Linked Data Science since it allows us to share the essence of pipelines between platforms, companies, and institutions without revealing critical internal information. Instead, it focuses on the semantics of what is being processed and how. We are developing different applications on top of our linked data science (LiDS) graph to automate various aspects of data science pipelines. Examples of these applications are KGpip and KGFram.

KGQAn: A Universal Question-Answering Platform for Knowledge Graphs

KGQAn aims to develop a data science chatbot that can answer questions from an arbitrary KG without prior knowledge of the KG. KGQAn proposes a novel formalization of question understanding as a triple pattern extraction modelled using a Seq2Seq neural network. Our model generalizes to understand questions across diverse domains. Moreover, KGQAn introduces a just-in-time linking and filtering approach, which performs entity and relation linking as semantic search queries partially offloaded to the RDF engines without requiring any pre-processing. Thus, KGQAn acts as an on-demand KG question-answering service.

KGNet: a GML-Enabled Knowledge Graph Platform

KGNet is a knowledge graph platform with full support for graph machine learning (GML)-enabled queries. We designed KGNet as an extension on top of existing RDF engines. KGNet provides GML as a service (GMLaaS) to automate the training of GML models on KGs. In KGNet, we collect the trained models’ metadata and maintain a transparent RDF graph associated with the target knowledge graph (KG). Using this metadata graph, KGNet can optimize and execute GML-enabled queries, which apply the trained models on the target KG.

KGpip: A Scalable AutoML Approach Based on GNN

KGpip is scalable AutoML approach based on a novel formulation for the AutoML problem as a graph generation problem. In KGpip, we train a novel meta-learning on top of of our knowledge graph for linked data science to pose learner and pre-processing selection as a generation of different graphs representing ML pipelines. For more information, please read our KGpip paper

KGFarm: A Feature Discovery Platform for Data Science

KGFarm aims at automating data preparation and feature discovery pipelines. It is one of the applications on top of our linked data science. KGFarm is a joint project with RBC’s Borealis AI. We develop KGFarm based on actual needs in the industry to enable data scientists to auto-learn from each other’s pipelines.

AlphaBot: Improving Chatbots for Code Repositories

AlphaBot is a weak supervision-based approach to improve chatbots for code repositories. We evaluate AlphaBot using a dataset that composes of 749 queries representing 52 intents. Our results show that AlphaBot helps chatbot practitioners to boost the NLU’s performance at early releases of their chatbots (i.e., fewer training queries). In particular, we find that our approach increases the NLU’s performance up to 44% compared to the baseline. Also, the results show that AlphaBot annotates, on average, 99% of queries correctly.

KGAPT: an APT Detection Approach based on GNN

This project aims at developing a platform for detecting advanced persistent threats (APT) based on knowledge graph technologies. Our approach utilizes graph neural network and semantic graph similarity to detect attack scenarios in a provenance graph of network logs.

KG-DAL: Automatic Annotation for KG related tasks

This project aims at developing a deep active learning platform for triple extraction tasks from the English text. Our platform automates the dataset annotation process required for training models for question understanding or knowledge graph construction.