We use cookies to collect and analyse information on site performance and usage to improve and customise your experience, where applicable. View our Cookies Policy. Click Accept and continue to use our website or Manage to review and update your preferences.


AI‘s ‘marked improvements’ on legal questions

20 Feb 2025 technology Print

‘Marked improvements’ in AI's legal answers

A report by a leading British law firm has concluded that artificial-intelligence (AI) tools could be reaching the stage where they could be used as a cross-check for lawyers working on a legal issue.

The Law Society Gazette of England and Wales says that testing by ‘magic circle’ firm Linklaters found marked improvements in how large language model (LLM) machines were able to answer questions about different areas of legal practice.

The responses were still not always right and lacked nuance, despite advances having been made.

'Less good' at interpreting clauses

The firm’s report on its findings said that AI tools should not be used for English-law legal advice without expert human supervision.

The report added, however, that “if that expert supervision is available, they are getting to the stage where they could be useful, for example, by creating a first draft or as a cross-check.

“This is particularly the case for tasks that involve summarising relatively well-known areas of law. In contrast, their ability to apply the law to the facts or interpret clauses is less good,” Linklaters concludes.

'Fictional citations'

The Gazette says that the firm’s researchers asked questions that would require advice from a competent mid-level (two years’ post-qualification experience) lawyer, specialised in that practice area.

The LLMs’ answers were marked out of ten by senior Linklaters lawyers, comprising five marks for substance, three for whether the answer was supported by relevant statute or case law, and two marks for clarity.

The last benchmarking exercise in October 2023 had revealed major flaws in the models tested, with mostly wrong answers and some fictional citations.

Linklaters said that it would change its methodology after feedback, to offer more sophisticated prompt engineering.

AI ‘tried too hard’

In the latest testing, both Gemini 2.0 and OpenAI o1 scored at least six out of ten and showed material increases in the scores for substance and the accuracy of citations.

GPT 4 scored just 3.2 out of ten, recording just one out of five for the substance of its answers.

Lawyers noted that it felt like the AI tools had “tried too hard” – in many cases producing the right answer, but alongside a lot of extra and duplicative material.

‘Eagerness’ led to overstatement

The report said that one potential problem was that the “eagerness” of the models to provide a clear answer led to their overstating the confidence of their advice.

The report concluded that even if flaws in AI technology were ironed out, that did not necessarily mean that human involvement was removed.

It added: “Breaking the client’s requirements down into a series of legal steps that will achieve the client’s aim with the minimum effort, expense and uncertainty is the interesting and creative part of being a lawyer.

"Answering nutshell questions is the easy bit,” the report concluded

Gazette Desk
Gazette.ie is the daily legal news site of the Law Society of Ireland

Copyright © 2025 Law Society Gazette. The Law Society is not responsible for the content of external sites – see our Privacy Policy.