April 20, 2024

Big language models struggle to make accurate legal arguments • The Register

Interview According to new research, leading large language models tend to generate inaccurate legal information and should not be relied on for litigation.

Last year, when OpenAI demonstrated that GPT-4 was capable of passing the bar exam, it was heralded as a breakthrough in AI and led some people to wonder whether the technology could soon replace lawyers. Some hoped that these types of models could empower people who cannot afford expensive lawyers to seek legal justice, making access to legal help more equitable. However, the reality is that LLMs cannot even help professional lawyers effectively, according to a recent study.

The biggest concern is that AI often fabricates false information, which poses a big problem, especially in an industry that relies on objective testing. A team of researchers from Yale and Stanford University who analyzed hallucination rates in popular big language models found that they often do not accurately retrieve and generate relevant legal information, nor understand or reason about various laws.

In fact, OpenAI’s GPT-3.5, which currently powers the free version of ChatGPT, crashes about 69 percent of the time when tested on different tasks. Results were worse for PaLM-2, the system previously behind Google’s Bard chatbot, and Llama 2, the large language model launched by Meta, which generated falsehoods at rates of 72 and 88 percent, respectively.

As expected, the models have difficulty completing more complex tasks than the easier ones. Asking AI to compare different cases and see if they agree on an issue, for example, is challenging, and is more likely to generate inaccurate information than when faced with an easier task, such as checking which court a case was filed in. case.

Although LLMs excel at processing large amounts of text and can train on enormous amounts of legal documents (more than any human lawyer could read in a lifetime), they do not understand the law and cannot form strong arguments.

“While we’ve seen these types of models make great strides in forms of deductive reasoning in coding or math problems, those are not the kinds of skills that characterize top-tier lawyers,” said Daniel Ho, co-author of the Yale-The Stanford article says Register.

“What lawyers are really good at and what they excel at is often described as a form of analogical reasoning in a common law system, reasoning based on precedent,” added Ho, associate faculty director at the Stanford Institute for Centered Studies. in the Human. Artificial intelligence.

Machines also often fail at simple tasks. When asked to inspect a name or quote to see if a case is real, GPT-3.5, PaLM-2, and Llama 2 can invent false information in the answers.

“The model doesn’t need to know anything about the law honestly to answer that question correctly. It just needs to know whether a case exists or not, and it can see it anywhere in the training corpus,” said Matthew Dahl, PhD. law student at Yale University, he says.

It shows that AI cannot even accurately retrieve information and that there is a fundamental limit to the technology’s capabilities. These models are usually prepared to be pleasant and useful. They usually don’t bother correcting users’ assumptions and instead take their side. If chatbots are asked to generate a list of cases in support of some legal argument, for example, they are more likely to invent lawsuits than respond with nothing. A couple of lawyers learned this the hard way when they were disciplined for citing cases that were completely made up by OpenAI’s ChatGPT in their court filing.

The researchers also found that all three models they tested were more likely to be knowledgeable about federal litigation involving the U.S. Supreme Court compared to localized legal proceedings involving smaller, less powerful courts.

Since GPT-3.5, PaLM-2, and Llama 2 were trained with text scraped from the Internet, it makes sense that they would be more familiar with the publicly released legal opinions of the U.S. Supreme Court compared to documents legal presented in other types. of courts that are not so easily accessible.

They were also more likely to have difficulty on tasks that involved remembering information from new and old cases.

“Hallucinations are most common among older and newer Supreme Court cases, and less common among postwar Warren Court cases (1953-1969),” according to the article. “This result suggests another important limitation in the legal knowledge of LLMs that users should be aware of: the peak performance of LLMs may lag several years behind the current state of the doctrine, and LLMs may not internalize the case law that “It is very old but still applicable. and the relevant law.”

Too Much AI Could Create a ‘Monoculture’

Researchers were also concerned that over-reliance on these systems could create a legal “monoculture.” Since the AI ​​is trained on a limited amount of data, it will refer to more prominent and well-known cases leading lawyers to ignore other relevant legal interpretations or precedents. They may overlook other cases that could help them see different perspectives or arguments, which could be crucial in litigation.

“The law itself is not monolithic,” says Dahl. “A monoculture is particularly dangerous in a legal environment. In the United States, we have a federal common law system where the law develops differently in different states in different jurisdictions. There are different lines or trends of jurisprudence that develop over time “.

“It could lead to erroneous results and unjustified confidence in a way that could harm litigants,” Ho adds. He explained that one model could lead to inaccurate answers to attorneys or people seeking to understand something like eviction laws.

“When you seek the help of a long language model, you may get exactly the wrong answer about when you should file or what the type of eviction rule is in this state,” he says, citing an example. “Because what it tells you is New York law or California law, as opposed to the law that really matters for your particular circumstances in your jurisdiction.”

The researchers conclude that the risks of using these types of popular models for legal work are greater for those filing lower court proceedings in smaller states, particularly if they are less experienced and challenge the models based on false assumptions. These people are more likely to be attorneys, who are less powerful and come from smaller law firms with fewer resources, or people looking to represent themselves.

“In summary, we find that the risks are greatest for those who would benefit most from LLMs,” the paper states. ®

Leave a Reply

Your email address will not be published. Required fields are marked *