Author Image

Hi, I am Wissam

Wissam Antoun

PhD Student in AI at ALMAnaCH - INRIA, Paris

Artificial Intelligence and Machine Learning PhD Student and Engineer specialised in state‑of‑the‑art Natural Language Processing techniques. I worked on several NLP projects, including AraBERT, AraELECTRA, and AraGPT2, and CamemBERTa

Experiences

1
ALMAnaCH - INRIA

Mar 2022 - Present

Paris, France

PhD Researcher

Mar 2023 - Present

Responsibilities:
  • Developing methods to detect text generated by language models.
  • Researching effect of a language model’s pre‑training data, architecture, watermarking, quantization, and RLHF on detectability.
Research Engineer

Mar 2022 - Feb 2023

Responsibilities:
  • Implementing new pre‑training techniques for french language representation.
  • Researching language models for languages displaying high variability, in particular Arabic dialects used on social media.

Siren Analytics

Jan 2021 - Feb 2022

Beirut, Lebanon

Senior Machine Learning Engineer

Jan 2021 - Feb 2022

Responsibilities:
  • Deployment of AI models in production with a focus on performance and monitoring.
  • Researched and developed semantic search engines using state‑of‑the‑art NLP and IR approaches.
  • Built an OCR API for ID card scanning.
  • Designed and developed document AI solutions to extract information from scanned documents.
  • Developed a real‑time anomaly detection system for lockdown mobility permit.
  • Developed encryption solutions for model IP protection.
  • My work also involved researching and applying latest AI solutions, evaluating software solution vendors, and internship mentoring
2

3

Beirut, Lebanon

Graduate Research Assistant

Sep 2018 - Jan 2021

Responsibilities:
  • Co‑founded AUB’s Machine INtelligence Development (MIND) Lab
  • My research focused on state‑of‑the‑art NLP technologies for natural language processing, generation, and chatbots.
  • Worked on pre‑training various Arabic language models that ranges from RNN‑based to the Transformer‑based models, using Cloud TPU and GPU instances.
  • Researched improving temporal and spacial resolution of remote sensing ML systems for soil moisture prediction.
Graduate Teaching Assistant

Feb 2018 - Jan 2020

Responsibilities:
  • Taught and graded undergraduate courses in computer science and engineering.
  • EECE 435L - Software Tools Lab, Co‑Instructor (3 semesters). Delivered a range of teaching and assessment activities including tutorials directed towards the delivery of modern Software Tools at the undergraduate level.
  • EECE 696 - Applied Parallel Programming, Lab instructor (1 semester). Delivered hands‑on experience to graduate students on parallel programming and GPU computing using the CUDA C/C++ framework on local computers and on AWS.
  • EECE 330 - Data Structures and Algorithms, Lab instructor (1 semester). Delivered hands‑on experience on fundamental algorithms and data structures used in software applications at the undergraduate level using C++.
  • Introduced a plagiarism detection tool for computer code to the Electrical Engineering Department.

Zaka AI

May 2020 - May 2020

Beirut, Lebanon

AI Instructor

May 2020 - May 2020

Responsibilities:
  • Gave a 4‑part online workshop series titled A Comprehensive NLP series that was focused on delivering the major NLP concepts through hands‑on coding examples along with the NLP fundamentals theory.
4

5
Stars of Science

Feb 2020 - Apr 2020

Beirut, Lebanon

Machine Learning Expert

Feb 2020 - Apr 2020

Responsibilities:
  • Provided Machine Learning and Artificial Intelligence solutions as part of the support team that helps contestants throughout their Stars of Science journey.

Beirut, Lebanon

Intern

June 2016 - Aug 2016

Responsibilities:
  • Learned about various LTE principles, base station hardware, and Microwave transmissions systems.
6

Education

Feb. 2018 ‑ Sep. 2020
Masters Of Engineering In Electrical And Computer Engineering
Major Area:
Artificial Intelligence and Machine Learning systems
Minor Area:
Software, Networking and Security
Scholarships:
Awarded Graduate Fellowship with full tuition coverage
Thesis:
Transformers for Arabic Natural Language Understanding and Generation
Supervisor:
Prof. Hazem Hajj
Awards:
Abdul Hadi Debs Endowment Award for Academic Excellence Nominee
Sep. 2013 ‑ Jul. 2017
Bachelor Of Engineering In Computer And Communication Engineering
Focus:
Communications and Networking, Antennas and Propagation, and Digital Signal Processing
Awards:
Dean’s List for 6 semesters

Projects

AraBERT, AraELECTRA, and AraGPT2
AraBERT, AraELECTRA, and AraGPT2
Owner Feb 2020

Pre-trained Transformers for Arabic Language Understanding and Generation (Arabic BERT, Arabic GPT2, Arabic ELECTRA)

CamemBERTa
Owner Jan 2023

Code for training DeBERTa V3 from scratch. Used to train CamemBERTa

GalacTex
GalacTex
Owner Dec 2023

Self-Hosted Large Language Models for Overleaf

SlurmTUI
Owner Dec 2023

Terminal UI for monitoring SLURM jobs

OarTUI
Nothing Jul 2023

TUI for browsing, canceling, and inspecting OAR jobs on a cluster using only the terminal.

Accomplishments

First Place, Arabic Sentiment Analysis 2021 @ KAUST
Announcment Aug. 2021

The competition required building machine learning models that can determine the sentiment (positive, negative, neutral) behind Arabic text (tweets), with a prize of 10000 USD, 5000 USD, and 2000 USD to the first, second and third place winners, respectively. The competition ran for 3 months, with 74 teams participating and submitting their predictions. Wissam ranked 3rd on the public leaderboard when the submission window closed. Top-ranked participants were then invited to submit and share their codes to the organizers for the final evaluation on another private 20,000 tweets dataset. Wissam’s submission scored the highest on the private dataset.

Second Place, OSACT4‑Shared task on Offensive Language Detection
OSACT4 May 2020

The rise of offensive speech, including vulgar or targeted insults, reflects increasing polarization in society, amplified by social media. A shared task aims to detect such speech in Arabic social media using the SemEval 2020 dataset, which includes manually annotated tweets for offensiveness (OFF or NOT_OFF) and hate speech (HS or NOT_HS). The dataset is split into train, dev, and test sets, with subtasks for offensive language and hate speech detection. Subtask B, identifying hate speech, is more challenging due to its lower prevalence. Our team won second place in both tasks, advancing research on offensive content and hate speech identification in Arabic tweets.

Abdul Hadi Debs Endowment Award for Academic Excellence Nominee

The Abdul Hadi Debs Endowment Award for Academic Excellence is a $1,000 endowment to a student at the graduate level who has an outstanding academic record and has demonstrated research capabilities through a paper, project, or thesis deemed by the faculty to be worthy of publication.​ I was nominated for this award by the Department of Electrical and Computer Engineering.

In Track 1-A, we developed a machine learning model to detect fake news and identify the news domain, using an annotated training corpus provided via email. In Track 1-B, we built a model to distinguish between bot and human Twitter accounts, also using an annotated dataset provided by Marc Jones. I received recognition for creating the best domain detection system in Track 1-A.

Second Place, Best car price prediction score

First place, Music Genre Classification

Skills