\setcode

utf8

[1]\fnmSalwa \surAbbara \equalcontThese authors contributed equally to this work.

\equalcont

These authors contributed equally to this work.

\equalcont

These authors contributed equally to this work.

[1]\fnmAreej \surAlhothali \equalcontThese authors contributed equally to this work. \equalcontThese authors contributed equally to this work.

[1]\orgdivComputer Science Department, \orgnameFaculty of Computing and Information Technology, King Abdulaziz University, \orgaddress\streetStreet, \cityJeddah, \postcode21589, \countrySaudi Arabia

[2]\orgdivFaculty of Law, \orgnameKing Abdulaziz University, \orgaddress\streetStreet, \cityJeddah, \postcode21589, \countrySaudi Arabia

ALJP: An Arabic Legal Judgment Prediction in Personal Status Cases Using Machine Learning Models

[email protected] \fnmMona \surHafez [email protected] \fnmAya \surKazzaz [email protected] [email protected] \fnmAlhanouf \surAlsolami [email protected] * *

Abstract

Legal Judgment Prediction (LJP) aims to predict judgment outcomes based on case description. Several researchers have developed techniques to assist potential clients by predicting the outcome in the legal profession. However, none of the proposed techniques were implemented in Arabic, and only a few attempts were implemented in English, Chinese, and Hindi. In this paper, we develop a system that utilizes deep learning (DL) and natural language processing (NLP) techniques to predict the judgment outcome from Arabic case scripts, especially in cases of custody and annulment of marriage. This system will assist judges and attorneys in improving their work and time efficiency while reducing sentencing disparity. In addition, it will help litigants, lawyers, and law students analyze the probable outcomes of any given case before trial. We use a different machine and deep learning models such as Support Vector Machine (SVM), Logistic regression (LR), Long Short Term Memory (LSTM), and Bidirectional Long Short-Term Memory (BiLSTM) using representation techniques such as TF-IDF and word2vec on the developed dataset. Experimental results demonstrate that compared with the five baseline methods, the SVM model with word2vec and LR with TF-IDF achieve the highest accuracy of 88% and 78% in predicting the judgment on custody cases and annulment of marriage, respectively. Furthermore, the LR and SVM with word2vec and BiLSTM model with TF-IDF achieved the highest accuracy of 88% and 69% in predicting the probability of outcomes on custody cases and annulment of marriage, respectively.

keywords:

legal judgment prediction, text classification, machine learning, deep learning, word embedding

1 Introduction

With the development of artificial intelligence (AI) and judicial informatization reform, the development of intelligent justice has received a great deal of attention. The proper application of artificial intelligence technology can improve judicial practitioner efficiency, optimize justice methods, and reduce sentence inconsistency. Several intelligent solutions, such as intelligent legal document generation, automatic question answering, and automatic speech recognition in the court system, have been effectively implemented in casework. Legal judgment prediction (LJP) is the critical point of artificial judicial intelligence. LJP systems aim to predict judgment results according to the facts of cases with feasible judgment suggestions, such as the prediction of charges, imprisonment terms, and applicable law articles. LJP systems can also assist litigants, attorneys, students, and teachers by improving their work and time efficiency while reducing the risk of making mistakes, so it serves as a valuable resource for professionals.

Being in the legal field is a boatload of predicting and the risk of facing various hidden dangers while doing so. When deciding whether or not to take a new case, an astute attorney would evaluate the nature of the legal issues at hand and the case’s likely results. Typically, wise attorneys would avoid making precise, unambiguous predictions about what they likely believe the conclusion will be. They safely distanced themselves from a seemingly ironclad prediction, emphasizing that they do not guarantee the legal results. New attorneys often make bold predictions; they almost end up on the wrong side of a displeased client who later remembers how confidently boastful the first prediction was, especially if the case goes astray from the anticipated outcome. The advent of Artificial Intelligence (AI) in the legal industry enhances attorneys’ prediction abilities. It becomes more practical to analyze a huge corpus of legal cases and generate predictions for a newly given legal case utilizing AI techniques, such as Natural Language Processing and Machine Learning. Using AI and deep learning in LJP is beneficial in providing fast decision-making and outcomes which allows input data to be instantly verified, unbiased judgment towards gender or nationality, analysis with previous instances of comparable patterns, and identify situations with substantial variation in human and AI choices makes it easier to uncover corruption.

Several studies have investigated the possibility of predicting judgment outcomes in legal cases written in English, Hindi, and Chinese. No study has investigated the prediction of legal judgment in Arabic, especially in personal status cases such as custody, divorce, and annulment of marriages. Thus, we develop an LJP system that performs two tasks in this paper. The first task is to predict judgment decisions and relevant law articles or evidence in (custody and annulment of marriage) using the pleading of the case. The second is to predict the probability of possible outcomes of personal status (custody and annulment of marriage) cases given the plaintiff’s claim and defend answer. We developed an Arabic legal prediction dataset for personal status cases (using the Kingdom of Saudi Arabia (KSA) as a case study). The dataset was generated using a sample collection published Ministry of Justice and a simulated dataset by experts in the domain. We evaluated the proposed model using different machines and deep learning models such as SVM, LR, LSTM, and BiLSTM with different data representation techniques such as TF-IDF and word2vec on the developed dataset.

2 Related work

Several studies have utilized machine learning techniques to predict judicial outcomes. Aletras et al. [1] presented the first comprehensive research of predicting the result of cases heard by the European Court of Human Rights (ECHR) based only on the textual content presented in the court to determine if a human rights article has been violated or not. An Support Vector Machine (SVM) classifier trained on textual information using N-grams and topics in Human Rights Documentation cases was used, achieving high accuracy in predicting court outcomes (79%). Sil and Roy [2] also proposed a model which aims to deliver justice by providing judicial argument-based analysis using the SVM algorithm. The model is trained on features like years of marriage, dowry details, and postmortem reports from a dataset of ‘dowry death’ cases in West Bengal. The model achieved 93% accuracy in binary classification. Medvedeva et al. [3] proposed a machine learning model to predict court decisions using textual information from ECHR cases. The program uses SVM Linear classifier and word n-gram TF-IDF to analyze textual data, predicting 75% of cases correctly. Similarly, Shaikh et al. [4] proposed a machine learning model to predict outcomes of murder-related cases in the Delhi District Court. Several machine learning classifiers were evaluated in this task, including Classification and Regression Trees (CART), Bagging, Random Forest, and SVM, to predict trial outcomes using several handcrafted features obtained by manually analyzing the cases. The results show that the best performance is obtained using Bagging, RF, and SVM, CART with ranges between 91,86% and 90.70%.

Several studies have utilized deep learning techniques in predicting legal case outcomes. Zhang et al. [5] proposed an automatic law article prediction model based on Deep Pyramid Convolutional Neural Network. They predict the relevant law article for cases using case description and associated legal provisions. The results show that the proposed method outperforms various state-of-the-art baselines models on several public datasets [6]. Li et al. [7] proposed a neural network model based on an element-driven attention mechanism that takes the textual description of a criminal case as the input and predicts the charges, applicable law articles, and prison terms. The approach is evaluated on a real-world dataset containing $125,830$ judgment documents of criminal cases published by China Judgments. The model accuracy for element prediction, charge prediction result, law article prediction, and prison term was 98.83%, 97.92%, 98.16%, and 82.13%, respectively. Most existing methods follow the text classification framework that fails to model the complex interactions among complementary case materials. Long et al. [8] formalized the task as Legal Reading Comprehension according to the legal scenario. The framework predicts the final judgment results based on three types of information, including fact description, plaintiffs’ pleas, and law articles and predict if a certain plea in a given civil case would be supported or rejected.

Multi-task learning (MTL) models were also examined for LJP in several studies. The MTL model considers the relationship between subtasks such as law articles, charges, and penalty terms in LJP task. Zhong et al. [9] proposed a topological multi-task learning framework that formulates the dependencies among subtasks (law articles, charges, fines, and the term of penalty) as directed acyclic graph (DAG) to jointly predict the trial subtasks outcomes. Their approach outperforms single-task baselines and conventional multi-task learning models on three Chinese criminal case datasets. Wang and Jin [10] proposed a MTL LJP model based on CNN-BiGRU and is used to improve the accuracy and efficiency of legal judgment prediction. Several data representation were used including TF-IDF and word2vec, word embedding, fact encoding, and document representation achieving accuracy of $95.1\%$ for Law articles prediction, $95.2\%$ for Charges, and $72.6\%$ for the term of penalty. Huang and Lin [11] built a multi-task deep neural network classification model based on integrating CNN with the attention LSTM model to achieve high precision of crime and related law prediction. The model is evaluated on a dataset of $626,600$ judicial documents collected from the Internet achieving an average F1 score of 93.62% and 90.84% for crime, related law prediction, respectively.

Li et al. [12] proposed a multi-channel attentive neural network that uses attention mechanism and BiGRU hierarchical sequence encoder to learn better semantic representation and interaction among different parts of case descriptions. The multi-channel attentive encoders take a generated law articles based on the case fact description and defendant persona as input and pass them into three hierarchical encoders (fact-channel encoder, persona channel encoder, article-channel encoder) incorporating with word-level and sentence-level attention context vector to predict the charges and prison term. Yao et al. [13] proposed a novel gated hierarchical multitask learning network to jointly model multiple sub-tasks (law article, charge, and term of penalty) in judicial decision prediction. The model combines a Gated Hierarchical Encoder (GHE) to extract in-depth semantic information of fact description from multiple perspectives and a Dependencies Auto-learning Predictor (DAP) to learn the dependencies among sub-tasks dynamically. The proposed model takes the fact description as input and predicts law article, charge, and term of penalty.

The attention mechanism has been successfully used in many NLP tasks in recent years. A number of studies used attention mechanisms in the task of predicting judicial outcomes. Bao et al. [14] proposed an attention neural network that uses relevant articles to improve the performance and interpretability of charge prediction tasks. The model uses the fact description to extract relevant law articles that assist in locating key information from the fact description and help improve the performance of charge prediction. To address the challenges of predicting judgment in lengthy cases, Sukanya and J.Priyadarshini [15] presented an effective hierarchical attention deep neural network model with fine-tuned transformer to predict legal cases outcomes. Yang et al. [16] employed LSTM with self-attention to simulate a judge’s recurrent reading behavior utilizing semantic mutual information between evidence and article. Xu et al. [17] presented an end-to-end model to solve the task of LJP. To distinguish confusing charges, they propose a novel graph neural network to automatically learn subtle differences between confusing law articles by capturing essential but rare features and design a novel attention mechanism that fully exploits the learned differences to extract compelling discriminative features from fact descriptions attentively.

Kowsrihawat et al. [18] proposed a prediction model of criminal cases using End-to-End BiGRU deep learning neural networks. Their model imitates a process of legal interpretation, whereby recurrent neural networks read the fact from an input case and compare them against relevant legal provisions with the attention mechanism. Some related studies have combined several deep neural models. Yuan et al. [19] proposed a framework for the automated judging based on ensemble strategy that combines many deep neural network models and manual law features to solve the problem of data imbalanced. Simultaneously, they increase the framework’s performance by enhancing the data. Their model shows that data enhancement and ensemble strategy can improve the performance of judgment prediction.

Transformer-based models have shown a tremendous impact on many NLP problems. Similar to other NLP problems, transformer-based models were successfully used in predicting the outcome of the legal cases. Wang et al. [20] utilized the recently widely used pre-trained language model Bidirectional Encoder Representations from Transformers (BERT) for LJP. BERT model significantly improves accusation prediction accuracy compared to other deep learning models with Word2Vec representation. Similarly, Chalkidis et al. [21] developed a hierarchical version of BERT for judgment prediction. They present a new publicly available dataset of English legal judgment prediction cases from the ECHR. Zhu et al. [22] proposed a Transformer-Hierarchical-Attention Multi-Extra Network that takes fact description, court view, and basic information of the defendant as input to predict law articles, charges, and terms of penalty. The dataset they used is CJO consists of criminal cases published by the Chinese government from China Judgment Online.

Some studies have employed Tensor decomposition techniques in legal cases outcome prediction. Guo et al. [23] developed a new algorithm based on innovative tensor decomposition and ridge regression for judgment prediction. The model tested on a dataset obtained from the Chinese Referee Document Network.In the same vein, Guo et al. [24] build an intelligent judgment approach, which is based on the relationship-driven recurrent neural network and restricted tensor decomposition. The recurrent neural network were used for intelligent judgment of multiple accusations in legal cases. The model tested on legal cases obtained from a Chinese refereeing study network.

Previous studies in LJP have utilized various machine and deep learning techniques, including SVM, LR, and K-Nearest Neighbors. Most studies utilized deep learning techniques and sequence models, while some used attention mechanisms for predicting legal case outcomes. Tensor decomposition-based models were also employed. LJP models rely on law articles and predict law articles, charges, and penalties. Researchers have formulated problems as single-task learning or multi-task learning, predicting single or multiple outcomes. Previous studies were mainly conducted in Chinese, Hindi, and English, and mostly focused on crime cases. However, no studies have been conducted in Arabic or personal status cases, as personal status cases in KSA are adjudicated at the judge’s discretion, not on law articles.

3 Dataset

Several dataset were used in the filed of LJP. These dataset were generally written in Chinese, English, and Hindi. To serve the purpose of this study, we developed an Arabic LJP dataset for Saudi Arabian Personal Status Cases, we have first collected personal status cases published by the ministry of justice and we also generated a new sample of cases throughout experts in the field. Table 1 shows information about the Ministry of Justice dataset. Each article in the sample collection contains the topic which provide details of case category, the evidence which provide details of the reasons and legal evidence that was relied upon in judging the case, a summary of the case which include a summary of the claim, answer, and judgment, the final judgment which is a description of the claim, answer, and sessions that took place in the court, in addition to the judgment on the case and the. The sample collection has five type of cases which are custody, annulment of marriage, visiting, divorce, and alimony. We have only consider in this research custody and annulment of marriage cases due to limitation and variation of outcomes of the rest of the personal status cases in the current sample of collection. The dataset has a total number of $49$ cases that each has a true judgment outcome.

The dataset textual content were obtained from portable document format(PDF) files and many words are extracted incorrectly due to the font in which the data was written. Thus, to correct some of the extracted data, several data cleaning and manual corrections were performed.

Source	Ministry of justice
Content	Topic, evidence, case summery, judgment
Types	Numbers of Cases
Custody*	20
Annulment of marriage*	29
Visiting	2
Divorce	0
Alimony	44

Table 1: Ministry of Justice Sample collections

We have formulated the problem as a multi class classification problem. The multi class classification task, takes a pleading text as input and generate the judgment (one of multiple outcomes) and reasons/evidences (one of multiple reasons) as output. The original trail cases were analyzed with experts in the law field (lawyers and professors) and a list of possible and more common judgement decisions were obtained. Based on this analysis, we formulated the problem of predicting judgement in custody cases as a multi-class classification problem of four classes, namely, mother grant full custody, father grant full custody, children over seven years old have parental choice, while those under seven have custody with their mother until they reach seven, and other. Table 2 shows the judgment classes in custody cases. For the reasons or law article prediction, we formulated the problem as a multi class classification with eight classes correspond to the law articles used in the custody cases. For annulment of marriage cases, we had also four classes which are annulment of marriage with compensation, annulment of marriage without compensation, deny annulment, other. Table 3 show the classes of annulment of marriage. The reasons or law article prediction in the annulment of marriage cases were formulated as a multi class classification problem with 11 classes.

Table 4 shows an example of the binary classification problem.

Class Name in Arabic	Class Name in English
\RLتخيير الابناء فوق السبع سنوات، وتكون الحضانة للام لمن لم يبلغ سبعه سنوات	Children over seven years old have parental choice, while those under seven have custody with their mother until they reach seven
\RLحضانة الاولاد لوالدتهم	mother grant custody of children
\RLحضانة الاولاد لوالدهم	mother grant custody of children
\RLأخرى	Other

Table 2: Judgement Decision Classes in Custody Cases

Class Name in Arabic	Class Name in English
\RLفسخ نكاح لعوض	annulment of marriage for compensation
\RLفسخ نكاح بدون عوض	annulment of marriage without compensation
\RLفسخ نكاح	annulment of marriage
\RLرد دعوة المدعي	deny Annulment

Table 3: Judgement Decision Classes in the Annulment of Marriage Cases

Type of text Arabic original text Translated text Plaintiff claim (\RLنص الدعوى ) \RLادعت المدعية على الغائب مجلس الحكم بأنه كان… plaintiff claimed that the defendant (absent from the Governing Council) is.. Defendant answer(\RLنص الاجابة) \RLوبعرض ذلك على المدعي اجاب قائلا ماذكرت… by presenting it to the defendant, he replied.. Pleading (\RLنص المرافعة) \RLما ذكرته المدعية من الزواج والأولاد ثم الطلاق فهذا كله صحيح، ولكن الطلاق… what the wife mentioned about marriage and divorce is true, but the divorce.. Reasons(\RLالأسباب) \RL ما صح عن النبي صلى الله عليه وسلم خير غلاما بين أبيه وأمه The Prophet, PBUH, asked a boy to choose between his mother and father.. Judgment Decision(\RL الحكم) \RLالحكم بحضانة البنت للأم mother shall grant custody of her daughter Plaintiff claim (\RLنص الدعوى ) \RL ادعى المدعي وكالة قائلا ان الام قد تزوجت plaintiff claimed that the defendant got married Defendant answer(\RLنص الاجابة) \RL وبعرض ذلك على المدعي عليها ما ذكره by presenting it to the defendant, she replied Pleading (\RL المرافعة) \RLموكلي يطلب الحضانة للأسباب التالية My client requested custody for the following reasons.. Reasons(\RLالأسباب) \RLأنت أحق به ما لم تنكحي You are more entitled to him as long as you do not get married. Judgment Decision(\RL الحكم) \RLالحكم بحضانة البنت للأب father shall grant custody of his daughter

Table 4: example of the multi class classification problem

To increase the model performance, another simulated dataset was collected throughout experts in the law domain, the experts are selected among Master and PhD students, and validated by Arabic speaking law professors. The dataset has a larger number of custody and annulment of marriage cases. As shown in Table 5 the simulated dataset consist of the plaintiff’s claim, the defendant’s response, pleading (a description of the sessions that took place in court), the judgment, and the evidences (the reasons and legal evidence that was relied upon in judging the case).

Source	Law experts (master, PhD students, Professors)
Content	claim, answer, pleading, judgment, evidences
Types	Number of Cases
Custody	55
Annulment of marriage	24

Table 5: Simulated dataset

4 Methodology

4.1 Problem formulation

In this study, we have developed two LJP models based on Arabic personal status cases. The first model aims to predict the judgement and reasons (article law) given the court pleading of Arabic personal status cases (custody and annulment of marriage). Thus, given a pleading of a case $x$ that consist of sequence of words $x=\{x_{1},x_{2},..,x_{k}\}$ , where $k$ is the length of $x$ . The goal is to predict the corresponding judgment result $y_{i}$ where is one of the possible judgement decisions (the label set in our task) $Y$ , $y_{i}\in Y$ .

The second model aims to predict the probability of the possible judgment decision and associated reasons given the plaintiff claim and defendant answer. Suppose the claim text $m$ of a case consists of a sequence of words, $m=\{m_{1},m_{2},...m_{f}\}$ , where $f$ is the length of claim text, and the answer text $n$ consists of a sequence of words, $n=\{n_{1},n_{2},...n_{j}\}$ , where $j$ is the length of the answer. We formulated the this task as a multi label classification problem i which the goal is to predict the probability of each of the corresponding judgment outcomes $y_{i}\in Y$ where $Y$ is the label set of our task, and the probability of each judgment given $m$ and $n$ .

4.2 Data Preprocessing

A series of data preprocessing steps were performed on the input data can be clean and normalize the textual data and transform the data into a dense representation that models can analyze. Data preprocessing includes a manual correction of misspelled extracted words, tokenization, data cleaning, including stop words, unimportant words, dates removal, and text normalization, including diacritical marks removal, were conducted on the dataset.

1.

Tokenization

Tokenization is essentially splitting a phrase, sentence, paragraph, or an entire text document into smaller units, such as individual words or terms. Each of these smaller units is called a token. This technique is an important technique because the meaning of the text could easily be interpreted by analyzing the words present in the text.
2.

Remove stop words

One of the significant forms of preprocessing is to filter out useless data. In natural language processing, useless words are referred to as ”stop words.” Eliminating stop words has several advantages, including shortened indexing structures, faster processing, and improved retrieval effectiveness. The Arabic language has many lexical tokens, which implies stop words can be found in large quantities.
3.

Remove dates

The data contains lots of dates which were excluded at this study, such as the date of birth and the date of the session.
4.

Remove diacritics
Diacritics are marks placed above or below (or sometimes next to) a letter in an Arabic word to indicate a particular pronunciation with regard to accent, tone, or stress, as well as meaning, especially when a homograph exists without the marked letter or letters. In this study, the diacritics were removed to normalize the text, reduce the feature space dimensionality, and ensure there is no difference between a word with diacritics and one without.

4.3 Text Representation

Representing text as a real-valued numerical representation that captures word semantics and similarity between words is an essential step in the natural language process task. In this study, we utilized two widely used word embedding; the first is a discreet text representation in which words are represented by their corresponding indexes to their position in a dictionary from a larger corpus. We, in particular, use a weighted version of the bag of word (BOW) model, namely term frequency inverse document frequency (TF-IDF). TF-IDF normalizes the word frequency by giving frequent words less weight according to the following equation:

TF-IDF=TF(w,d)*IDF(w)

(1)

Where $TF(w,d)$ is the frequency of word $w$ in the document $d$ and $IDF(w)$ is defined as follows:

IDF(w)=log\frac{N}{IDF(w)}

(2)

Where $N$ is the total number of documents and $df(w)$ is the frequency of documents containing the word $w$ . The second word representation is distributed word embedding that allows words with similar meanings to have a similar representation. The used distributed text representation is the word2vec model for Arabic language text (Aravec) [25]. Word2vec models are a powerful, dense word representation that has been widely used in natural language processing tasks due to their ability to capture word semantic similarities.

4.4 Model and Hyperparameter

We implemented several machine and deep learning models to evaluate the proposed model for predicting judgment and evidences from pleading text. We implemented several machine learning model and deep learning models with TF-IDF and word2vec word representations. Two machine learning are used in our experiment, namely, SVM and LR. While the implemented deep learning are LSTM and BiLSTM.

1.

SVM Model
We used SVM model due to its simplicity and its efficiency in high dimensional spaces. We applied a grid search to obtain the best hyperparameters C, gamma (’C’ for adding a penalty for each misclassified data point, ’gamma’ for controlling a single training point’s distance of influence), and kernel function that is used for the model. We took the best kernel function from the grid search but used different values for C and gamma to get the best hyper-parameters.
2.

LR Model
We did the same as in SVM, but we used Logistic Regression() with solver= ’sag’ that uses the cross-entropy loss and supports the multi-class case.
3.

LSTM Model
The architecture of the STM model consists of four layers. The first layer is the input layer with a shape (1200,) and data type ’int32’. The second layer is the embedding layer. The next layer is the LSTM layer with 300 neurons as a parameter, followed by a dense layer with 300 neurons with ’relu’ activation functions. The last layer, the dense layer, consists of 4 neurons with a ’softmax’ activation function and ’sparse_categorical_crossentropy’ loss function. For the LSTM model with word2vec word representation, the embedding layer seeded by AraVec word embedding weight (300-dimensional Twitter Skip-gram version 3).
4.

BILSTM model
For this model, we used exactly the same hyperparameters as the LSTM model, but we added the Bidirectional wrapper with an LSTM layer (64 neurons) that propagates the input forward and backward through the LSTM layer to learn long-term dependencies from both sides.

To predict the probability of possible outcomes using the claim and answer, we used SVM, LR, LSTM, and BiLSTM with TF-IDF and word2vec text representations. We used the same hyperparameters and methods described previously. In LSTM and BiLSTM, we used four neurons with ’sigmoid’ activation function to estimate returns the probability for each class (judgment outcome).

5 Results and Discussion

5.1 Evaluation Metrics

To evaluate the judgment and evidences prediction, we used accuracy, precision, recall rate, and F1-score.

5.2 Results and Analysis

We evaluated the performance on two LJP tasks, including predicting the judgment and predicting the probability of possible outcomes. We experimented with different machine learing and deep learning models in order to choose the best models for both tasks. Tables 6, 7, and 8 show the experimental results of predicting judgment, law articles, and probability of judgements, respectively. As shown in Table 6 for predicting the judgment of the custody cases on the dataset that combines data from the Ministry of Justice and simulated data, the SVM model with word2vec representation gave the highest accuracy of 88%. Deep learning model like LSTM and BiLSTm did not outperform SVM model due to the relatively small size dataset. Table 6 also shows the results for predicting the judgment of the annulment of marriage cases on the combined dataset. The LR model with TF-IDF data representation give the highest accuracy of 78%. The results indicate that predicting the judgment in custody cases is higher than the prediction of the annulment of marriage.

	Custody				Annulment of Marriage
Models	P(%)	R(%)	F1(%)	Acc(%)	P(%)	R(%)	F1(%)	Acc(%)
SVM-TFIDF	60	44.33	46.33	81	62.5	63.5	62.5	78
SVM-Word2Vec	88	100	93	88	23.75	27.25	24.5	56
LR-TFIDF	75	100	86	75	62.5	63.5	62.5	78
LR-Word2Vec	86	100	75	75	48.75	38.5	40.75	50
LSTM-TFIDF	31	33.33	29.33	87.5	24	27.75	25	56.25
LSTM-Word2Vec	31	33.33	29.33	87.5	24	27.75	25	56.25
BILSTM-TFIDF	31	33.33	29.33	87.5	24	27.75	25	56.25
BILSTM-Word2Vec	75	100	86	75	63	100	77.5	62.5

Table 6: Results of experiments for predicting the judgment on Custody and Annulment of Marriage Cases using machine learning and deep learning models with different word representations

	Custody				Annulment of Marriage
Models	P(%)	R(%)	F1(%)	Acc(%)	P(%)	R(%)	F1(%)	Acc(%)
SVM-TFIDF	17.66	17	10	19	23.63	20.9	25	34
SVM-Word2Vec	3.66	3.66	5.55	25	13	12.72	10.63	25
LR-TFIDF	11.88	13.33	5	12	6.81	10.81	8.27	41
LR-Word2Vec	38.88	33.33	35.22	12	6.45	6.36	6.18	50
LSTM-TFIDF	3.66	3.66	5.55	25	27.57	23.05	26.06	37.5
LSTM-Word2Vec	3.66	3.66	5.55	25	27.57	23.05	26.06	37.5
BILSTM-TFIDF	3.66	3.66	5.55	25	27.57	23.05	26.06	37.5
BILSTM-Word2Vec	40.5	34.71	36.68	12.5	12.72	15.39	15.73	30.25

Table 7: Results of experiments for predicting the evidences (law articles) in custody and annulment of marriage cases using machine learning and deep learning models and text representations

In Table 7, the results show that the best models for predicting law articles in custody cases are LSTM and BiLSTM with TF-IDF representation with 25% accuracy. The results of predicting law articles in the annulment of marriage cases were higher, with the best results of 50% accuracy obtained using LR with word2vec models. These results indicate the challenge of predicting law articles in custody and the annulment of marriage cases.

	Custody				Annulment of Marriage
Models	P(%)	R(%)	F1(%)	Acc(%)	P(%)	R(%)	F1(%)	Acc(%)
SVM-TFIDF	60	44.33	46.33	81	94.25	91.75	98.75	68
SVM-Word2Vec	29.33	33.33	31	88	31.66	36.33	81.33	56
LR-TFIDF	25	33.33	28.66	75	31.25	34.75	32.75	66
LR-Word2Vec	29.33	33.33	31	88	48.75	38.5	40.75	50
LSTM-TFIDF	29.33	33.33	31	87.5	32	36.33	81.33	56.25
LSTM-Word2Vec	29.33	33.33	31	87.5	32	36.33	81.33	56.25
BILSTM-TFIDF	29.33	33.33	31	87.5	32	36.33	34.33	68.75
BILSTM-Word2Vec	29.33	33.33	31	87.5	48.75	38.5	40.75	50

Table 8: Results of experiments for predicting the probability of possible outcomes on custody and annulment of marriage cases using machine learning and deep learning models

Table 8 shows the results for predicting the probability of judgment outcome in custody cases. SVM and LR with word2vec representation gave the highest accuracy of 88%. The results also demonstrate that BiLSTM with TF-IDF showed the highest accuracy for predicting the probability of judgment outcome in the annulment of marriage cases with 68% accuracy.

6 Conclusion

Several researchers have developed techniques for predicting the outcomes of cases in the legal profession. However, none of the proposed techniques were implemented in Arabic, and only a few attempts were implemented in English. This project aims to develop an Arabic legal judgment prediction system that utilizes deep learning models, such as LSTM and BiLSTM, and machine learning techniques, such as SVM and LR, to predict the judgment outcome and law articles from Arabic language case scripts, especially personal status cases in Saudi Arabia. Thus, we developed an Arabic LJP dataset from publicly available sample collections combined with the data artificially generated by law experts. The dataset was then analyzed to formulate the LJP task as a multi-class classification problem. We investigated several machine learning and deep learning models with different types of word representation. The results indicate that for predicting the judgment of the custody cases from pleading text, the SVM model with word2vec data representation achieved the highest accuracy of 88%. For the annulment of marriage cases, the LR model with TFIDF achieved the best result with 78% accuracy. For predicting the law article or evidence of custody cases, BiLSTM with TFIDF gave 25% accuracy, while for the annulment of marriage cases, the LR model with word2vec achieved 50% accuracy. For predicting the most probable judgment from claim and answer, the highest accuracy of 88% was obtained using SVM and LR with word2vec representation in custody cases and 68% using BiLSTM with TF-IDF in the annulment of marriage cases.

References

\bibcommenthead
Aletras et al. [2016] Aletras, N., Tsarapatsanis, D., Preoţiuc-Pietro, D., Lampos, V.: Predicting judicial decisions of the european court of human rights: A natural language processing perspective. PeerJ Computer Science 2, 93 (2016)
Sil and Roy [2020] Sil, R., Roy, A.: A novel approach on argument based legal prediction model using machine learning. In: 2020 International Conference on Smart Electronics and Communication (ICOSEC), pp. 487–490 (2020). IEEE
Medvedeva et al. [2020] Medvedeva, M., Vols, M., Wieling, M.: Using machine learning to predict decisions of the european court of human rights. Artificial Intelligence and Law 28(2), 237–266 (2020)
Shaikh et al. [2020] Shaikh, R.A., Sahu, T.P., Anand, V.: Predicting outcomes of legal cases based on legal factors using classifiers. Procedia Computer Science 167, 2393–2402 (2020)
Zhang et al. [2019] Zhang, H., Wang, X., Tan, H., Li, R.: Applying data discretization to dpcnn for law article prediction. In: NLPCC (2019)
Xiao et al. [2018] Xiao, C., Zhong, H., Guo, Z., Tu, C., Liu, Z., Sun, M., Feng, Y., Han, X., Hu, Z., Wang, H., et al.: Cail2018: A large-scale legal dataset for judgment prediction. arXiv preprint arXiv:1807.02478 (2018)
Li et al. [2019] Li, S., Liu, B., Ye, L., Zhang, H., Fang, B.: Element-aware legal judgment prediction for criminal cases with confusing charges. In: 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 660–667 (2019). IEEE
Long et al. [2019] Long, S., Tu, C., Liu, Z., Sun, M.: Automatic judgment prediction via legal reading comprehension. In: China National Conference on Chinese Computational Linguistics, pp. 558–572 (2019). Springer
Zhong et al. [2018] Zhong, H., Guo, Z., Tu, C., Xiao, C., Liu, Z., Sun, M.: Legal judgment prediction via topological learning. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3540–3549 (2018)
Wang and Jin [2020] Wang, C., Jin, X.: Study on the multi-task model for legal judgment prediction. In: 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), pp. 309–313 (2020). IEEE
Huang and Lin [2019] Huang, D., Lin, W.: A model for legal judgment prediction based on multi-model fusion. In: 2019 3rd International Conference on Electronic Information Technology and Computer Engineering (EITCE), pp. 892–895 (2019). IEEE
Li et al. [2019] Li, S., Zhang, H., Ye, L., Guo, X., Fang, B.: Mann: A multichannel attentive neural network for legal judgment prediction. IEEE Access 7, 151144–151155 (2019)
Yao et al. [2020] Yao, F., Sun, X., Yu, H., Yang, Y., Zhang, W., Fu, K.: Gated hierarchical multi-task learning network for judicial decision prediction. Neurocomputing 411, 313–326 (2020)
Bao et al. [2019] Bao, Q., Zan, H., Gong, P., Chen, J., Xiao, Y.: Charge prediction with legal attention. In: CCF International Conference on Natural Language Processing and Chinese Computing, pp. 447–458 (2019). Springer
Sukanya and J.Priyadarshini [2021] Sukanya, G., J.Priyadarshini: A meta analysis of attention models on legal judgment prediction system. International Journal of Advanced Computer Science and Applications 12(2) (2021) https://doi.org/10.14569/IJACSA.2021.0120266
Yang et al. [2019] Yang, Z., Wang, P., Zhang, L., Shou, L., Xu, W.: A recurrent attention network for judgment prediction. In: International Conference on Artificial Neural Networks, pp. 253–266 (2019). Springer
Xu et al. [2020] Xu, N., Wang, P., Chen, L., Pan, L., Wang, X., Zhao, J.: Distinguish confusing law articles for legal judgment prediction. arXiv preprint arXiv:2004.02557 (2020)
Kowsrihawat et al. [2018] Kowsrihawat, K., Vateekul, P., Boonkwan, P.: Predicting judicial decisions of criminal cases from thai supreme court using bi-directional gru with attention mechanism. In: 2018 5th Asian Conference on Defense Technology (ACDT), pp. 50–55 (2018). IEEE
Yuan et al. [2019] Yuan, L., Wang, J., Fan, S., Bian, Y., Yang, B., Wang, Y., Wang, X.: Automatic legal judgment prediction via large amounts of criminal cases. In: 2019 IEEE 5th International Conference on Computer and Communications (ICCC), pp. 2087–2091 (2019). IEEE
Wang et al. [2020] Wang, Y., Gao, J., Chen, J.: Deep learning algorithm for judicial judgment prediction based on bert. In: 2020 5th International Conference on Computing, Communication and Security (ICCCS), pp. 1–6 (2020). IEEE
Chalkidis et al. [2019] Chalkidis, I., Androutsopoulos, I., Aletras, N.: Neural legal judgment prediction in english. arXiv preprint arXiv:1906.02059 (2019)
Zhu et al. [2020] Zhu, K., Guo, R., Hu, W., Li, Z., Li, Y.: Legal judgment prediction based on multiclass information fusion. Complexity 2020 (2020)
Guo et al. [2020] Guo, X., Zhang, H., Ye, L., Li, S., Zhang, G.: Tenrr: An approach based on innovative tensor decomposition and optimized ridge regression for judgment prediction of legal cases. IEEE Access 8, 167914–167929 (2020)
Guo et al. [2019] Guo, X., Zhang, H., Ye, L., Li, S.: Rnrtd: Intelligent approach based on the relationship-driven neural network and restricted tensor decomposition for multiple accusation judgment in legal cases. Computational intelligence and neuroscience 2019 (2019)
Soliman et al. [2017] Soliman, A.B., Eissa, K., El-Beltagy, S.R.: Aravec: A set of arabic word embedding models for use in arabic nlp. Procedia Computer Science 117, 256–265 (2017)