Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

temp #3

Draft
wants to merge 955 commits into
base: ref.py
Choose a base branch
from
Draft

temp #3

wants to merge 955 commits into from

Conversation

Wovchena
Copy link
Owner

@Wovchena Wovchena commented Dec 6, 2023

No description provided.

@Wovchena Wovchena force-pushed the temp branch 2 times, most recently from d168f1b to f08cf61 Compare December 8, 2023 10:41
@Wovchena Wovchena force-pushed the temp branch 2 times, most recently from f978336 to 23b3427 Compare January 3, 2024 13:25
Wovchena pushed a commit that referenced this pull request May 10, 2024
Added cmake build type before project clause
Wovchena pushed a commit that referenced this pull request Jun 19, 2024
Preemption algorithm finalization
olpipi and others added 21 commits July 30, 2024 18:20
Compression currently fails with the latest `optimum-intel` version

Changes:
- Update usage of `_check_default_4bit_configs ` after
huggingface/optimum-intel#843
- Update optimum-intel version

---------

Co-authored-by: Ekaterina Aidova <[email protected]>
…lkit#716)

Bumps [optimum[openvino]](https://github.com/huggingface/optimum) from
1.20.0 to 1.21.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/huggingface/optimum/releases">optimum[openvino]'s
releases</a>.</em></p>
<blockquote>
<h2>v1.21.2: Patch release</h2>
<ul>
<li>Remove inplace op in mistral patcher by <a
href="https://github.com/IlyasMoutawwakil"><code>@​IlyasMoutawwakil</code></a>
in <a
href="https://redirect.github.com/huggingface/optimum/issues/1938">#1938</a></li>
<li>Fix ORTModelForFeatureExtraction modeling by <a
href="https://github.com/moria97"><code>@​moria97</code></a> in <a
href="https://redirect.github.com/huggingface/optimum/issues/1941">#1941</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/huggingface/optimum/compare/v1.21.1...v1.21.2">https://github.com/huggingface/optimum/compare/v1.21.1...v1.21.2</a></p>
<h2>v1.21.1: Patch release</h2>
<ul>
<li>Fix sentence transformers model patching by <a
href="https://github.com/echarlaix"><code>@​echarlaix</code></a> in <a
href="https://redirect.github.com/huggingface/optimum/pull/1936">huggingface/optimum#1936</a></li>
<li>Update Intel extra by <a
href="https://github.com/echarlaix"><code>@​echarlaix</code></a> in <a
href="https://redirect.github.com/huggingface/optimum/pull/1935">huggingface/optimum#1935</a></li>
<li>Update Habana extra by <a
href="https://github.com/regisss"><code>@​regisss</code></a> in <a
href="https://redirect.github.com/huggingface/optimum/pull/1937">huggingface/optimum#1937</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/huggingface/optimum/compare/v1.21.0...v1.21.1">https://github.com/huggingface/optimum/compare/v1.21.0...v1.21.1</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/huggingface/optimum/commit/4237e1d8cebb1b9b33fd3b1f75f71e8c97bbace8"><code>4237e1d</code></a>
Release: v1.21.2</li>
<li><a
href="https://github.com/huggingface/optimum/commit/5c803db8cef21b22d0bdbf8a69653b74656e193e"><code>5c803db</code></a>
Fix forward bug in ORTModelForFeatureExtraction (<a
href="https://redirect.github.com/huggingface/optimum/issues/1941">#1941</a>)</li>
<li><a
href="https://github.com/huggingface/optimum/commit/f755a58e56597f690be4a0c4bdb549ce0ffd4e03"><code>f755a58</code></a>
Remove inplace op in mistral patcher (<a
href="https://redirect.github.com/huggingface/optimum/issues/1938">#1938</a>)</li>
<li><a
href="https://github.com/huggingface/optimum/commit/f7912d64ec23a986355e9bcdf23a947e8a91acd8"><code>f7912d6</code></a>
Update Habana extra (<a
href="https://redirect.github.com/huggingface/optimum/issues/1937">#1937</a>)</li>
<li><a
href="https://github.com/huggingface/optimum/commit/4e01a4a948cf48a9152f86349e82ea6cc72a0d03"><code>4e01a4a</code></a>
Update optimum intel extra (<a
href="https://redirect.github.com/huggingface/optimum/issues/1935">#1935</a>)</li>
<li><a
href="https://github.com/huggingface/optimum/commit/ae591be7632b1148430b884aaeb49e78ce561b8d"><code>ae591be</code></a>
Fix sentence transformers model patching (<a
href="https://redirect.github.com/huggingface/optimum/issues/1936">#1936</a>)</li>
<li><a
href="https://github.com/huggingface/optimum/commit/16d4d7298ba721438e2bed58a6a8e586eb50519c"><code>16d4d72</code></a>
Update dev version (<a
href="https://redirect.github.com/huggingface/optimum/issues/1934">#1934</a>)</li>
<li><a
href="https://github.com/huggingface/optimum/commit/86adc3e50a2bed04c8ecf86e1eba170b451e4afd"><code>86adc3e</code></a>
Support transformers 4.42 (<a
href="https://redirect.github.com/huggingface/optimum/issues/1929">#1929</a>)</li>
<li><a
href="https://github.com/huggingface/optimum/commit/a5500c7e5047ec43e73925a01a1e98b72e64b0d3"><code>a5500c7</code></a>
Fixed bug key error &quot;last_hidden_state&quot; (<a
href="https://redirect.github.com/huggingface/optimum/issues/1674">#1674</a>)</li>
<li><a
href="https://github.com/huggingface/optimum/commit/d82d4c656ed80da6684cd4d3766edfda8e7a1705"><code>d82d4c6</code></a>
Fix incorrect names for usage blenderbot for causallm (<a
href="https://redirect.github.com/huggingface/optimum/issues/1887">#1887</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/huggingface/optimum/compare/v1.20.0...v1.21.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=optimum[openvino]&package-manager=pip&previous-version=1.20.0&new-version=1.21.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once CI passes on it, as requested by
@Wovchena.

[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alina Kladieva <[email protected]>
Co-authored-by: Anastasiia Pnevskaia <[email protected]>
Co-authored-by: Nikita Malinin <[email protected]>
Co-authored-by: Yaroslav Tarkan <[email protected]>
Co-authored-by: Anatoliy Talamanov <[email protected]>
Co-authored-by: Pavel Esir <[email protected]>
Co-authored-by: Miłosz Żeglarski <[email protected]>
Co-authored-by: Pavel Esir <[email protected]>
Co-authored-by: Alexander Suvorov <[email protected]>
Co-authored-by: Xiake Sun <[email protected]>
Co-authored-by: Damian Kalinowski <[email protected]>
Co-authored-by: Andrei Kochin <[email protected]>
Co-authored-by: Ekaterina Aidova <[email protected]>
- Simplified partial preemption algorithm for groups with multiple
sequences.
- Removed dividing into single sequence and multiple sequence path.
…penvinotoolkit#649)

Changes:
- Further split of greedy and multinomial paths - using original logits
buffer in greedy and whenever possible in multinomial sampling. Sorted
vector is created only when top_p or top_k filters need to be applied.
- Fixing issue with top_k filter being applied always when multinomial
sampling is used unless it's explicitly set to 0. Now default value
(which is max for size_t) will not trigger applying top_k filter. The
filter will also not be applied if top_k is bigger than logits vector
size.
- Skipping multinomial tests
Co-authored-by: Alina Kladieva <[email protected]>
Co-authored-by: Anastasiia Pnevskaia <[email protected]>
Co-authored-by: Nikita Malinin <[email protected]>
Co-authored-by: Yaroslav Tarkan <[email protected]>
Co-authored-by: Anatoliy Talamanov <[email protected]>
Co-authored-by: Pavel Esir <[email protected]>
Co-authored-by: Miłosz Żeglarski <[email protected]>
Co-authored-by: Pavel Esir <[email protected]>
Co-authored-by: Alexander Suvorov <[email protected]>
Co-authored-by: Xiake Sun <[email protected]>
Co-authored-by: Damian Kalinowski <[email protected]>
Co-authored-by: Andrei Kochin <[email protected]>
Co-authored-by: Ekaterina Aidova <[email protected]>
Co-authored-by: guozhong wang <[email protected]>
…envinotoolkit#690)

When user sets `INFERENCE_PRECISION_HINT` change the kvcache type.

Ticket:
[145861](https://jira.devtools.intel.com/browse/CVS-145861)

---------

Co-authored-by: Dariusz Trawinski <[email protected]>
* Use sequence length axis in `trimm_tensor`
Wovchena and others added 27 commits October 12, 2024 15:56
**TODO:**
- [ ] Python API and sample
- [ ] Update doc strings
- [x] Update main README.md (PR
openvinotoolkit#930)
- [ ] Add sample with custom device mapping
- [ ] Experiment with reshape + compile as part of Ctor
- [x] Add LoRA (PR
openvinotoolkit#911)
- [X] Use std::optional for prompt2, prompt3 and maybe negative prompts
as well
- [X] Update
https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/SUPPORTED_MODELS.md
with text 2 image generation models
Draft VLM pipeline test
Ticket: CVS-153186

---------

Co-authored-by: wenyi5608 <[email protected]>
Co-authored-by: Wovchena <[email protected]>
Co-authored-by: Yaroslav Tarkan <[email protected]>
Co-authored-by: Alina Kladieva <[email protected]>
Co-authored-by: Pavel Esir <[email protected]>
Co-authored-by: Pavel Esir <[email protected]>
Co-authored-by: Artur Paniukov <[email protected]>
Co-authored-by: Ekaterina Aidova <[email protected]>
Co-authored-by: Ilya Lavrenov <[email protected]>
Co-authored-by: Mikhail Ryzhov <[email protected]>
Co-authored-by: Andrei Kochin <[email protected]>
Chat for continuous batching and for static pipeline should match with
stateful and HF

https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L1884-L1893

---------

Co-authored-by: Vladimir Zlobin <[email protected]>
Use new Constant construct to make it from memory pointer.

---------

Co-authored-by: Ilya Lavrenov <[email protected]>
This PR adds:
- [x] Long-form audio support with sequential chunking.

Common Todos for Whisper support:
- [ ] Long-form audio support with [parallel
chunking](https://huggingface.co/blog/asr-chunking).
- [ ] add perf metrics
- [ ] update documentation
- [ ] add cpp, python samples tests
- [ ] support timestamps streaming
- [ ] expose only meaningful parameters in `GenerationConfig` (`task`,
`language`, `return_timestamps`, etc)
- [ ] Move all whisper pipeline files to dedicated subfolder
- [ ] Whisper pipeline doesn't need tokenizer, it uses detokenizer only.
Implement detokenizer only initialization for `ov::genai::Tokenizer`
- [ ] Check discrete GPU. Integrated GPU works as expected.
- [ ] Investigate use of `RemoteTensor` for GPU
- [ ] Add batch
- [ ] Add sampler, inherit WhisperGenerationConfig from GenerationConfig
- [ ] Investigate language autodetection with single decoder (without
past) call
- [ ] Update python bindings cmake to include whole directory instead of
explicit list of files
- [ ] Add samples with audio preparation examples
- [ ] Add links to audio files so users can download them in samples
- [ ] Move supported models list from samples README to common supported
models section
- [ ] Avoid building GenAI in each tests job as it takes a lot of time
- [ ] Double check FP32 support
- [ ] Fix tests sporadic fails. Sometimes whisper model cannot be
downloaded from HF due to network issues
- [ ] Fix stop criteria. Current approach stops on eos_token which is no
speech token. But there could be more speech tokens further which are
wrongly skipped now.

Completed:
- [x] support different languages, language autodetection
- [x] support translation
- [x] support timestamps

Current limitations:
- No resampling during preprocessing. Input raw speech should have 16k
Hz sampling rate
- No normalization during preprocessing. Input raw speech should be
normalized to near [-1, 1] range

Tickets: CVS-147994, CVS-146010, CVS-152542

---------

Co-authored-by: Ilya Lavrenov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.