Filter by type:

Sort by year:

Multi-objective optimization via Wasserstein-Fisher-Rao gradient flow

Conference paper
Yinuo Ren, Tesi Xiao, Tanmay Gangwani, Anshuka Rangi, Holakou Rahmanian, Lexing Ying, Subhajit Sanyal
International Conference on Artificial Intelligence and Statistics
Publication year: 2024
Multi-objective optimization (MOO) aims to optimize multiple, possibly conflicting objectives with widespread applications. We introduce a novel interacting particle method for MOO inspired by molecular dynamics simulations. Our approach combines overdamped Langevin and birth-death dynamics, incorporating a “dominance potential” to steer particles toward global Pareto optimality. In contrast to previous methods, our method is able to relocate dominated particles, making it particularly adept at managing Pareto fronts of complicated geometries. Our method is also theoretically grounded as a Wasserstein-Fisher-Rao gradient flow with convergence guarantees. Extensive experiments confirm that our approach outperforms state-of-the-art methods on challenging synthetic and real-world datasets.

EYE-Llama, an in-domain large language model for ophthalmology

Article
Tania Haghighi, Sina Gholami, Jared Todd Sokol, Enaika Kishnani, Adnan Ahsaniyan, Holakou Rahmanian, Fares Hedayati, Theodore Leng, Minhaj Nur Alam
bioRxiv
Publication year: 2024

Background:

Training Large Language Models (LLMs) with in-domain data can significantly enhance their performance, leading to more accurate and reliable question-answering (QA) systems essential for supporting clinical decision-making and educating patients.

Methods:

This study introduces LLMs trained on in-domain, well-curated ophthalmic datasets. We also present an open-source substantial ophthalmic language dataset for model training. Our LLMs (EYE-Llama), first pre-trained on an ophthalmology-specific dataset, including paper abstracts, textbooks, EyeWiki, and Wikipedia articles. Subsequently, the models underwent fine-tuning using a diverse range of QA datasets. The LLMs at each stage were then compared to baseline Llama 2, ChatDoctor, and ChatGPT (GPT3.5) models, using four distinct test sets, and evaluated quantitatively (Accuracy, F1 score, and BERTScore) and qualitatively by two ophthalmologists.

Results:

Upon evaluating the models using the American Academy of Ophthalmology (AAO) test set and BERTScore as the metric, our models surpassed both Llama 2 and ChatDoctor in terms of F1 score and performed equally to ChatGPT, which was trained with 175 billion parameters (EYE-Llama: 0.57, Llama 2: 0.56, ChatDoctor: 0.56, and ChatGPT: 0.57). When evaluated on the MedMCQA test set, the fine-tuned models demonstrated a higher accuracy compared to the Llama 2 and ChatDoctor models (EYE-Llama: 0.39, Llama 2: 0.33, ChatDoctor: 0.29). However, ChatGPT outperformed EYE-Llama with an accuracy of 0.55. When tested with the PubmedQA set, the fine-tuned model showed improvement in accuracy over both the Llama 2, ChatGPT, and ChatDoctor models (EYE-Llama: 0.96, Llama 2: 0.90, ChatGPT: 0.93, ChatDoctor: 0.92).

Conclusion:

The study shows that pre-training and fine-tuning LLMs like EYE-Llama enhances their performance in specific medical domains. Our EYE-Llama models surpass baseline Llama 2 in all evaluations, highlighting the effectiveness of specialized LLMs in medical QA systems. (Funded by NEI R15EY035804 (MNA) and UNC Charlotte Faculty Research Grant (MNA).)

Accelerating sinkhorn algorithm with sparse newton iterations

Conference paper
Xun Tang, Michael Shavlovsky, Holakou Rahmanian, Elisa Tardini, Kiran Koshy Thekumparampil, Tesi Xiao, Lexing Ying
The Twelfth International Conference on Learning Representations.
Publication year: 2024
Computing the optimal transport distance between statistical distributions is a fundamental task in machine learning. One remarkable recent advancement is entropic regularization and the Sinkhorn algorithm, which utilizes only matrix scaling and guarantees an approximated solution with near-linear runtime. Despite the success of the Sinkhorn algorithm, its runtime may still be slow due to the potentially large number of iterations needed for convergence. To achieve possibly super-exponential convergence, we present Sinkhorn-Newton-Sparse (SNS), an extension to the Sinkhorn algorithm, by introducing early stopping for the matrix scaling steps and a second stage featuring a Newton-type subroutine. Adopting the variational viewpoint that the Sinkhorn algorithm maximizes a concave Lyapunov potential, we offer the insight that the Hessian matrix of the potential function is approximately sparse. Sparsification of the Hessian results in a fast  per-iteration complexity, the same as the Sinkhorn algorithm. In terms of total iteration count, we observe that the SNS algorithm converges orders of magnitude faster across a wide range of practical cases, including optimal transportation between empirical distributions and calculating the Wasserstein  distance of discretized densities. The empirical performance is corroborated by a rigorous bound on the approximate sparsity of the Hessian matrix.

A Sinkhorn-type Algorithm for Constrained Optimal Transport

Article
Xun Tang, Holakou Rahmanian, Michael Shavlovsky, Kiran Koshy Thekumparampil, Tesi Xiao, Lexing Ying
arXiv preprint arXiv:2403.05054
Publication year: 2024
Entropic optimal transport (OT) and the Sinkhorn algorithm have made it practical for machine learning practitioners to perform the fundamental task of calculating transport distance between statistical distributions. In this work, we focus on a general class of OT problems under a combination of equality and inequality constraints. We derive the corresponding entropy regularization formulation and introduce a Sinkhorn-type algorithm for such constrained OT problems supported by theoretical guarantees. We first bound the approximation error when solving the problem through entropic regularization, which reduces exponentially with the increase of the regularization parameter. Furthermore, we prove a sublinear first-order convergence rate of the proposed Sinkhorn-type algorithm in the dual space by characterizing the optimization procedure with a Lyapunov function. To achieve fast and higher-order convergence under weak entropy regularization, we augment the Sinkhorn-type algorithm with dynamic regularization scheduling and second-order acceleration. Overall, this work systematically combines recent theoretical and numerical advances in entropic optimal transport with the constrained case, allowing practitioners to derive approximate transport plans in complex scenarios.

Extended Formulations via Decision Diagrams

Conference paper
Yuta Kurokawa, Ryotaro Mitsuboshi, Haruki Hamasaki, Kohei Hatano, Eiji Takimoto, Holakou Rahmanian
International Computing and Combinatorics Conference (pp. 17-28)
Publication year: 2023
We propose a general algorithm of constructing an extended formulation for any given set of linear constraints with integer coefficients. Our algorithm consists of two phases: first construct a decision diagram (V,E) that somehow represents a given m×n constraint matrix, and then build an equivalent set of |E| linear constraints over n+|V| variables. That is, the size of the resultant extended formulation depends not explicitly on the number m of the original constraints, but on its decision diagram representation. Therefore, we may significantly reduce the computation time for optimization problems with integer constraint matrices by solving them under the extended formulations, especially when we obtain concise decision diagram representations for the matrices. We can apply our method to 1-norm regularized hard margin optimization over the binary instance space {0,1}n, which can be formulated as a linear programming problem with m constraints with {1,0,1}-valued coefficients over n variables, where m is the size of the given sample. Furthermore, introducing slack variables over the edges of the decision diagram, we establish a variant formulation of soft margin optimization. We demonstrate the effectiveness of our extended formulations for integer programming and the 1-norm regularized soft margin optimization tasks over synthetic and real datasets.

Toward Understanding Privileged Features Distillation in Learning-to-Rank

Conference paper
Shuo Yang, Sujay Sanghavi, Holakou Rahmanian, Jan Bakus, S.V.N. Vishwanathan
36th Conference on Neural Information Processing Systems (NeurIPS 2022)
Publication year: 2022

In learning-to-rank problems, a \textit{privileged feature} is one that is available during model training, but not available at test time. Such features naturally arise in merchandised recommendation systems; for instance, “user clicked this item” as a feature is predictive of “user purchased this item” in the offline data, but is clearly not available during online serving. Another source of privileged features is those that are too expensive to compute online but feasible to be added offline. \textit{Privileged features distillation} (PFD) refers to a natural idea: train a “teacher” model using all features (including privileged ones) and then use it to train a “student” model that does not use the privileged features. In this paper, we first study PFD empirically on three public ranking datasets and an industrial-scale ranking problem derived from Amazon’s logs. We show that PFD outperforms several baselines (no-distillation, pretraining-finetuning, self-distillation, and generalized distillation) on all these datasets. Next, we analyze why and when PFD performs well via both empirical ablation studies and theoretical analysis for linear models. Both investigations uncover an interesting non-monotone behavior: as the predictive power of a privileged feature increases, the performance of the resulting student model initially increases but then decreases. We show the reason for the later decreasing performance is that a very predictive privileged teacher produces predictions with high variance, which lead to high variance student estimates and inferior testing performance.

Online Non-Additive Path Learning under Full and Partial Information

Conference paper
Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Holakou Rahmanian, Manfred Warmuth
ALT '19, Proceedings of the 30th International Conference on Algorithmic Learning Theory, PMLR 98:274-299, 2019.
Publication year: 2019

Abstract. We study the problem of online path learning with non-additive gains, which is a central problem appearing in several applications, including ensemble structured prediction. We present new online algorithms for path learning with non-additive count-based gains for the three settings of full information, semi-bandit and full bandit with very favorable regret guarantees. A key component of our algorithms is the definition and computation of an intermediate context-dependent automaton that enables us to use existing algorithms designed for additive gains. We further apply our methods to the important application of ensemble structured prediction. Finally, beyond count-based gains, we give an efficient implementation of the EXP3 algorithm for the full bandit setting with an arbitrary (non-additive) gain.

Online Learning of Combinatorial Objects via Extended Formulation

Conference paper
Holakou Rahmanian, David P. Helmbold and S.V.N. Vishwanathan
ALT '18, Proceedings of Algorithmic Learning Theory, PMLR 83:702-724, 2018.
Publication year: 2018

Abstract. The standard techniques for online learning of combinatorial objects perform multiplicative updates followed by projections into the convex hull of all the objects. However, this methodology can be expensive if the convex hull contains many facets. For example, the convex hull of n-symbol Huffman trees is known to have exponentially many facets. We get around this difficulty by exploiting extended formulations, which encode the polytope of combinatorial objects in a higher dimensional “extended” space with only polynomially many facets. We develop a general framework for converting extended formulations into efficient online algorithms with good relative loss bounds. We present applications of our framework to online learning of Huffman trees and permutations. The regret bounds of the resulting algorithms are within a factor of O(√log(n)) of the state-of-the-art specialized algorithms for permutations, and depending on the loss regimes, improve on or match the state-of-the-art for Huffman trees. Our method is general and can be applied to other combinatorial objects.

Online Learning of Combinatorial Objects

Thesis
Holakou Rahmanian
PhD Thesis. UC Santa Cruz.
Publication year: 2018

Abstract. This thesis develops algorithms for learning combinatorial objects. A combinatorial object is a structured concept composed of components. Examples are permutations, Huffman trees, binary search trees and paths in a directed graph. Learning combinatorial objects is a challenging problem: First, the number of combinatorial objects is typically exponential in terms of number of components. Second, the convex hull of these objects is a polytope whose characterization in the original space may have exponentially many facets or a description of the polytope in terms of facets/inequalities may not be even known. Finally, the loss of each object could be a complicated function of its component and may not be simply additive as a function of the components. In this thesis, we explore a wide variety of combinatorial objects and address the challenges above. For each combinatorial object, we go beyond the original space of the problem and introduce auxiliary spaces and representations. The representation of the objects in these auxiliary spaces admits additive losses and polytopes with polynomially many facets. This allows us to extend well-known algorithms like Expanded Hedge and Component Hedge to these combinatorial objects for the first time.

Deep Embedding Forest: Forest-based Serving with Deep Embedding Features

Patent
Ying Shan, Jianchang Mao, Dong Yu, Holakou Rahmanian, Yi Zhang
US Patent
Publication year: 2018

Abstract. A deep embedding forest-based (DEF) model for improving on-line serving time for classification learning methods and other tasks such as, for example, predicting user selection of search results provided in response to a query or for image, speech or text recognition. Initially, a deep neural network (DNN) model is trained to determine parameters of an embedding layer, a stacking layer, deep layers and a scoring layer thereby reducing high dimensional features. After training the DNN model, the parameters of the deep layers and the scoring layer of the DNN model and discarded and the parameters of the embedding layer and the stacking layer are extracted. The extracted parameters from the DNN model then initialize parameters of an embedding layer and a stacking layer of the DEF model such that only a forest layer of the DEF model is then required to be trained. Output from the DEF model is stored in computer memory.

Online Learning of Permutations Using Extended Formulation

Workshop paper
Holakou Rahmanian, David P. Helmbold, and S.V.N. Vishwanathan
Discrete Structures in Machine Learning Workshop at NeurIPS 2017,
Publication year: 2017

Abstract. We introduce a new method for efficiently learning permutations that exploits the technique of extended formulation from the combinatorial optimization community. This extended formulation technique encodes the hard-to-describe polytope of permutations as the projection of an easier to describe polytope in a higher dimensional space. Although the best special-purpose algorithm for permutations has a slightly better regret bound, our methodology yields regret bounds like those of other published algorithms. The new methodology can be generalized to other combinatorial objects. It also has an elegant method of determining the algorithm’s initial weight vector and the use of the extended formulation leads to a faster and more natural way to make appropriate predictions.

Online Dynamic Programming

Conference paper
Holakou Rahmanian, Manfred Warmuth
NeurIPS 2017, Advances in Neural Information Processing Systems, Volume 30,
Publication year: 2017

Abstract. We consider the problem of repeatedly solving a variant of the same dynamic programming problem in successive trials. An instance of the type of problems we consider is to find a good binary search tree in a changing environment. At the beginning of each trial, the learner probabilistically chooses a tree with the n keys at the internal nodes and the n + 1 gaps between keys at the leaves. The learner is then told the frequencies of the keys and gaps and is charged by the average search cost for the chosen tree. The problem is online because the frequencies can change between trials. The goal is to develop algorithms with the property that their total average search cost (loss) in all trials is close to the total loss of the best tree chosen in hindsight for all trials. The challenge, of course, is that the algorithm has to deal with exponential number of trees. We develop a general methodology for tackling such problems for a wide class of dynamic programming algorithms. Our framework allows us to extend online learning algorithms like Hedge and Component Hedge to a significantly wider class of combinatorial objects than was possible before.

Deep Embedding Forest: Forest-based Serving with Deep Embedding Features

Conference paper
Jie Zhu, Ying Shan, JC Mao, Dong Yu, Holakou Rahmanian, Yi Zhang
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2017, Pages 1703–1711
Publication year: 2017

Abstract. Deep Neural Networks (DNN) have demonstrated superior ability to extract high level embedding vectors from low level features. Despite the success, the serving time is still the bottleneck due to expensive run-time computation of multiple layers of dense matrices. GPGPU, FPGA, or ASIC-based serving systems require additional hardware that are not in the mainstream design of most commercial applications. In contrast, tree or forest-based models are widely adopted because of low serving cost, but heavily depend on carefully engineered features. This work proposes a Deep Embedding Forest model that benefits from the best of both worlds. The model consists of a number of embedding layers and a forest/tree layer. The former maps high dimensional (hundreds of thousands to millions) and heterogeneous low-level features to the lower dimensional (thousands) vectors, and the latter ensures fast serving. Built on top of a representative DNN model called Deep Crossing, and two forest/tree-based models including XGBoost and LightGBM, a two-step Deep Embedding Forest algorithm is demonstrated to achieve on-par or slightly better performance as compared with the DNN counterpart, with only a fraction of serving time on conventional hardware. After comparing with a joint optimization algorithm called partial fuzzification, also proposed in this paper, it is concluded that the two-step Deep Embedding Forest has achieved near optimal performance. Experiments based on large scale data sets (up to 1 billion samples) from a major sponsored search engine proves the efficacy of the proposed model.

Breathing Rate Prediction Using Finger-tip Sensor

Thesis
Holakou Rahmanian
Masters Thesis. UC Santa Cruz.
Publication year: 2015

Abstract. Personalized health-care is trending and individuals tend to wear sensors in order to record their own health data. As a part of this trend, any redundancy in the data captured by wearable sensors must be exploited to reduce the number of devices one may wear. In this thesis, we work with a device which senses breathing and pulse through pressure tube and pulse oximetry, respectively. Extracting the dependency between these two measurements, we approximately predict the breathing rate by first reconstructing the breathing signal using the data coming from the finger-tip sensor, and then detecting the peaks in the reconstructed signal. For breathing signal reconstruction, two different techniques are used: (1) applying low- and high-pass filters on the pulse signal (2) training a neural network on a prepared dataset. Our experiments show that neural networks have a better performance comparing to filters in reconstructing the breathing signal, and consequently, predicting the breathing rate.