Releases: Aleph-Alpha/intelligence-layer-sdk
Releases · Aleph-Alpha/intelligence-layer-sdk
v6.0.0
6.0.0
Features
- Remove cap for
max_concurrency
inLimitedConcurrencyClient
. - Introduce abstract
LanguageModel
class to integrate with LLMs from any API- Every
LanguageModel
supports echo to retrieve log probs for an expected completion given a prompt
- Every
- Introduce abstract
ChatModel
class to integrate with chat models from any API- Introducing
Pharia1ChatModel
for usage with pharia-1 models. - Introducing
Llama3ChatModel
for usage with llama models.
- Introducing
- Upgrade
ArgillaWrapperClient
to use Argilla v2.x - (Beta) Add
DataClient
andStudioDatasetRepository
as connectors to Studio for submitting data. - Add the optional argument
generate_highlights
toMultiChunkQa
,RetrieverBasedQa
andSingleChunkQa
. This makes it possible to disable highlighting for performance reasons.
Fixes
- Increase number of returned
log_probs
inEloQaEvaluationLogic
to avoid missing a valid answer
Deprecations
- Removed
DefaultArgillaClient
- Deprecated
Llama2InstructModel
Breaking Changes
-
We needed to upgrade argilla-server image version from
argilla-server:v1.26.0
toargilla-server:v1.29.0
to maintain compatibility.- Note: We also updated our elasticsearch argilla backend to
8.12.2
Full Changelog: v5.1.0...v6.0.0
- Note: We also updated our elasticsearch argilla backend to
v5.1.0
5.1.0
Features
- Updated
DocumentIndexClient
with support for metadata filters.- Add documentation for filtering to
document_index.ipynb
.
- Add documentation for filtering to
- Add
StudioClient
as a connector for submitting traces. - You can now specify a
chunk_overlap
when creating an index in the Document Index. - Add support for monitoring progress in the document index connector when embedding documents.
Fixes
- TaskSpan now properly sets its status to
Error
on crash.
Deprecations
- Deprecate old Trace Viewer as the new
StudioClient
replaces it. This affectsTracer.submit_to_trace_viewer
.
Full Changelog: v5.0.3...v5.1.0
v5.0.3
5.0.2
Fixes
- Reverted a bug introduced in
MultipleChunkRetrieverQa
text highlighting.
Full Changelog: v5.0.1...v5.0.2
5.0.1
5.0.1
Fixes
- Serialization and deserialization of
ExportedSpan
and itsattributes
now works as expected. PromptTemplate.to_rich_prompt
now always returns an empty list for prompt ranges that are empty.SingleChunkQa
no longer crashes if given an empty input and a specific prompt template. This did not affect users who used models provided incore
.- Added default values for
labels
andmetadata
forEvaluationOverview
andRunOverview
- In the
MultipleChunkRetrieverQa
, text-highlight start and end points are now restricted to within the text length of the respective chunk.
Full Changelog: v5.0.0...v5.0.1
v5.0.0
5.0.0
Breaking Changes
RunRepository.example_output
now returnsNone
and prints a warning when there is no associated record for the givenrun_id
instead of raising aValueError
.RunRepository.example_outputs
now returns an empty list and prints a warning when there is no associated record for the givenrun_id
instead of raising aValueError
.
Features
Runner.run_dataset
can now be resumed after failure by setting theresume_from_recovery_data
flag toTrue
and callingRunner.run_dataset
again.- For
InMemoryRunRepository
basedRunner
s this is limited to runs that failed with an exception that did not crash the whole process/kernel. - For
FileRunRepository
basedRunners
even runs that crashed the whole process can be resumed. DatasetRepository.examples
now accepts an optional parameterexamples_to_skip
to enable skipping ofExample
s with the provided IDs.- Add
how_to_resume_a_run_after_a_crash
notebook.
Fixes
- Remove unnecessary dependencies from IL
- Added default values for
labels
andmetadata
forPartialEvaluationOverview
Full Changelog: v4.1.0...v5.0.0
v4.1.0
4.1.0
New Features
- Add
eot_token
property toControlModel
and derived classes (LuminousControlModel
,Llama2InstructModel
andLlama3InstructModel
) and letPromptBasedClassify
use this property instead of a hardcoded string. - Introduce a new argilla client
ArgillaWrapperClient
. This uses theargilla
package as a connection to argilla and supports all question types that argilla supports in theirFeedbackDataset
. This includes text and yes/no questions. For more information about the questions, check their official documentation.- Changes to switch:
DefaultArgillaClient
->ArgillaWrapperClient
Question
->argilla.RatingQuestion
,options
->values
and it takes only a listField
->argilla.TextField
- Changes to switch:
- Add
description
parameter toAggregator.aggregate_evaluation
to allow individual descriptions without the need to create a newAggregator
. This was missing from the previous release. - Add optional field
metadata
toDataset
,RunOverview
,EvaluationOverview
andAggregationOverview
- Update
parameter_optimization.ipynb
to demonstrate usage of metadata****
- Update
- Add optional field
label
toDataset
,RunOverview
,EvaluationOverview
andAggregationOverview
- Add
unwrap_metadata
flag toaggregation_overviews_to_pandas
to enable inclusion of metadata in pandas export. Defaults to True.
Fixes
- Reinitializing different
AlephAlphaModel
instances and retrieving their tokenizer should now consume a lot less memory. - Evaluations now raise errors if ids of examples and outputs no longer match. If this happens, continuing the evaluation would only produce incorrect results.
- Performing evaluations on runs with a different number of outputs now raises errors. Continuing the evaluation in this case would only lead to an inconsistent state.
Full Changelog: v4.0.1...v4.1.0
v4.0.1
Breaking Changes
- Remove the
Trace
class, as it was no longer used. - Renamed
example_trace
toexample_tracer
and changed return type toOptional[Tracer]
. - Renamed
example_tracer
tocreate_tracer_for_example
. - Replaced langdetect with lingua as language detection tool. This mean that old thresholds for detection might need to be adapted.
New Features
Lineages
now containTracer
for individualOutput
s.convert_to_pandas_data_frame
now also creates a column containing theTracer
s.run_dataset
now has a flagtrace_examples_individually
to createTracer
s for each example. Defaults to True.- Added optional
metadata
field toExample
.
Fixes
- ControlModels throw a warning instead of an error in case a not-recommended model is selected.
- The
LimitedConcurrencyClient.max_concurrency
is now capped at 10, which is its default, as the underlyingaleph_alpha_client
does not support more currently. - ExpandChunk now works properly if the chunk of interest is not at the beginning of a very large document. As a consequence,
MultipleChunkRetrieverQa
now works better with larger documents and should return fewerNone
answers.
Full Changelog: v3.0.0...v4.0.1
v3.0.0
3.0.0
Breaking Changes
- We removed the
trace_id
as a concept from various tracing-related functions and moved them to acontext
. If you did not directly use thetrace_id
there is nothing to change.Task.run
no longer takes a trace id. This was a largely unused feature, and we revamped the trace ids for the traces.- Creating
Span
,TaskSpan
or logs no longer takestrace_id
. This is handled by the spans themselves, who now have acontext
that identifies them.Span.id
is therefore also removed. This can be accessed byspan.context.trace_id
, but has a different type.
- The
OpenTelemetryTracer
no longer logs a customtrace_id
into the attributes. Use the existing ids from its context instead. - Accessing a single trace from a
PersistentTracer.trace()
is no longer supported, as the user does not have access to thetrace_id
anyway. The function is now calledtraces
and returns all available traces for a tracer.
InMemoryTracer
and derivatives are no longerpydantic.BaseModel
. Use theexport_for_viewing
function to export a serializable representation of the trace.- We updated the graders to support python 3.12 and moved away from
nltk
-package:BleuGrader
now usessacrebleu
-package.RougeGrader
now uses therouge_score
-package.
- When using the
ArgillaEvaluator
, attempting to submit to a dataset, which already exists, will no longer work append to the dataset. This makes it more in-line with other evaluation concepts.- Instead of appending to an active argilla dataset, you now need to create a new dataset, retrieve it and then finally combine both datasets in the aggregation step.
- The
ArgillaClient
now has methodscreate_dataset
for less fault-ignoring dataset creation andadd_records
for performant uploads.
New Features
- Add support for Python 3.12
- Add
skip_example_on_any_failure
flag toevaluate_runs
(defaults to True). This allows to configure if you want to keep an example for evaluation, even if it failed for some run. - Add
how_to_implement_incremental_evaluation
. - Add
export_for_viewing
to tracers to be able to export traces in a unified format similar to OpenTelemetry.- This is not supported for the
OpenTelemetryTracer
because of technical incompatibilities.
- This is not supported for the
- All exported spans now contain the status of the span.
- Add
description
parameter toEvaluator.evaluate_runs
andRunner.run_dataset
to allow individual descriptions without the need to create a newEvaluator
orRunner
. - All models raise an error during initialization if an incompatible
name
is passed, instead of only when they are used. - Add
aggregation_overviews_to_pandas
function to allow for easier comparison of multiple aggregation overviews. - Add
parameter_optimization.ipynb
notebook to demonstrate the optimization of tasks by comparing different parameter combinations. - Add
convert_file_for_viewing
in theFileTracer
to convert the trace file format to the new (OpenTelemetry style) format and save as a new file. - All tracers can now call
submit_to_trace_viewer
to send the trace to the Trace Viewer.
Fixes
- The document index client now correctly URL-encodes document names in its queries.
- The
ArgillaEvaluator
not properly supportsdataset_name
. - Update outdated
how_to_human_evaluation_via_argilla.ipynb
. - Fix bug in
FileSystemBasedRepository
causing spurious mkdir failure if the file actually exists. - Update broken README links to Read The Docs.
- Fix a broken multi-label classify example in the
evaluation
tutorial.
Full Changelog: v2.0.0...v3.0.0
v2.0.0
2.0.0
Breaking Changes
- Changed the behavior of
IncrementalEvaluator::do_evaluate
such that it now sends allSuccessfulExampleOutput
s todo_incremental_evaluate
instead of only the newSuccessfulExampleOutput
s.
New Features
- Add generic
EloEvaluationLogic
class for implementation of Elo evaluation use cases. - Add
EloQaEvaluationLogic
for Elo evaluation of QA runs, with optional later addition of more runs to an existing evaluation. - Add
EloAggregationAdapter
class to simplify using theComparisonEvaluationAggregationLogic
for different Elo use cases. - Add
elo_qa_eval
tutorial notebook describing the use of an (incremental) Elo evaluation use case for QA models. - Add
how_to_implement_elo_evaluations
how-to as skeleton for implementing Elo evaluation cases
Fixes
ExpandChunks
-task is now fast even for very large documents
Full Changelog: v1.2.0...v2.0.0