Skip to content

Understanding Court Judgments in Brazil. NLP Project in Portuguese that seeks to analyze what factors influence the decision of judges in bankruptcy and judicial recovery proceedings in Brazil.

License

Notifications You must be signed in to change notification settings

thomas-ferraz/vox-legis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Vox Legis

Thank you for visiting!  

Members:

  • Marcelo de Souza (marcelo [dot] mcs [at] ime[dot] usp [dot] br)
  • Pedro Almeida (pedro [dot] hba [at] usp [dot] br)
  • Ricardo Tanaka (raktanaka [at] gmail [dot] com)
  • Thomas Ferraz (thomas [dot] ferraz [at] usp [dot] br)
  • Verena Saeta (verenacsaeta [at] usp [dot] br)

Supervisor

  • Rafael Ferreira (rafaelferreira [at] usp [dot] br)

Introduction

The project has been developed during the second semester of 2020 under the name Vox Legis, in the course MAC0434/MAC6967 offered by prof. Fabio Kon @ IME-USP, under the supervision of prof. Rafael Ferreira - FEA-USP.

Development was made with Jupyter Notebooks, which are in the /codes directory, with a README presenting further information. The notebooks can be run inside the Google Collaboratory platform, with data in Google Drive; or in a local machine, provided data is local.

Due to privacy concerns, the dataset is not provided, and a request to prof. Rafael Ferreira is needed to obtain it.

Abstract

The main proposal of this project is to create an algorithm capable of extracting information from the records of a lawsuit to obtain knowledge and outcomes such as:

  • Factors unrelated to the characteristics of the process that can affect judges' decisions:
    • Ethnic, racial, and gender prejudices;
    • Judge's experience;
    • Judge's political opinions;
    • Judge' personal sports preferences;
    • Other bias.
  • How to map the factors that may affect the outcome of the merit of the case and obtain a measure of impartiality from the Judiciary;
  • Analysis of the existence of variance in decisions as a measure of legal uncertainty;
  • Locate biases and preferences that have consequences on the real world.

Accomplished

  1. Get samples of lawsuit records;
  2. Extract administrative data from these procedures;
  3. Download the PDFs of the files of these procedures;
  4. Extract strings from the PDF;
  5. Identify who attached each document to the procedure file;
  6. If the document was attached by a lawyer, identify which party is represented by this lawyer;
  7. Identify which of the documents are judicial sentences;
  8. Classify decision into positive or negative according to the applicant (who started the legal action).

Unsuccessful

Data

In this project, we will work with data from lawsuits of the São Paulo Court of Justice (TJSP). All documents and various information from each proceeding that is currently being processed at the TJSP can be accessed through the court's electronic system, the e-Saj. Basic process information can be accessed by anyone, through the website procedural consultation, as long as you have the case number. To have access to the complete file of the case, containing all its documents, it is necessary to have attorney credentials and to authenticate on the website before accessing the procedural consultation page.

The raw data for this project refer to bankruptcy and judicial recovery lawsuits. This data was collected using a list of all bankruptcy and judicial recovery proceedings initiated between 2008 and 2017, sent in early 2018 by the TJSP. From this list and the credentials of a lawyer collaborating on the project, an algorithm was created using Selenium Library to access the e-Saj; perform authentication; extract the basic information of each case such as the court, district, name of the judge, name of the parties, qualified lawyers, etc. (see Dataset 1); and download the PDF files of the case file (see Dataset 2).

Dataset 1 - Proceedings Administrative Data

Each judicial process has an HTML page generated by the TJSP system, containing the main information of the process. The figure below shows an example, for the number process 1037133-31.2015.8.26.0100.

Figure 1: A bankruptcy lawsuit page on e-Saj Figure 1

For each bankruptcy or judicial recovery number sent by the TJSP, the page corresponding to that case number was scrapped on the Court's website. The collected information was saved in .rds files. These .rds files are in the dataset1 folder in Google Drive.

Dataset 2 - The file entire content of each procedure

The previous figure refers to a 2015 lawsuit. The legal proceedings from 2013 onwards are digital (or electronic) cases, and their pages on the TJSP website contain the link that allows access to the full file of proceedings, in PDF format:

Figure 2

When you click on this link, a new window opens, with all pages of the case file in PDF format.

Figure 3: Filings of a bankruptcy lawsuit in e-Saj Figure 3

Note that this process alone has more than 68 thousand pages in PDF. The open page is a judicial decision. All of these PDFs have been downloaded.

Inputs

  • Structured data on the process, extracted from the São Paulo State Court of Justice (TJSP) system:

    1. Name of the judge;
    2. Names of the lawyers;
    3. Names of the parties to the proceedings;
    4. (...)
  • PDFs of the case files:

    1. Petitions;
    2. Judicial decisions;
    3. (...)

Outputs

  • For each document written by the judger, classify it as positive or negative according to the applicant (the part that started the legal action).

Future Developments

The results obtained so far by the project are only the initial stage of a major objective: extracting the maximum amount of information and value from lawsuits, focusing mainly on the sentiment involved in each sentence.

The fundamental mission is based on the detection of possible biases or influences in the Brazilian judicial system that may cause judgments in disagreement with the law and civil rights. After all, these misjudgments tend to affect mainly individuals with little knowledge about their rights, often in a situation of social vulnerability and, consequently, with few opportunities for defense. Besides, the technical rigor of the legal language makes it hard to understand sentences clearly, also making it difficult for people outside the area of ​​law to understand what was decided without the help of an expert.

That said, we decided to focus on detecting some social problems that can also be reflected in the judicial system. That is, we want to verify if the type of decision of each of the analyzed judges is affected/altered by possible biases of behavior that are prejudiced or are in disagreement with social well-being, such as:

  1. Gender bias (judge tends to change his/her decision pattern according to the gender of those involved in the process);
  2. Race bias (judge tends to change his/her decision pattern according to the race of those involved in the process);
  3. More rigid/malleable judges according to the type of process (behavior analysis);
  4. Changes in the decision pattern of judges over time (effect of experience on decisions).

These are just a few of the many types of analysis that can be made from legal data, which can enable the use of technologies to monitor and modernize the judicial system, making it closer to society. Therefore, the use of data science tools in this field can contribute to an increasingly fair and coherent system, expanding the sense of justice and equality in the most diverse social strata.

Note about licensing

The project is licensed under the 3-clause BSD license, but for some clustering experiments dbmap is used, which has a GPL-v3 license.

About

Understanding Court Judgments in Brazil. NLP Project in Portuguese that seeks to analyze what factors influence the decision of judges in bankruptcy and judicial recovery proceedings in Brazil.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •