Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parse data from AsyncIterator #447

Open
uasan opened this issue Dec 4, 2024 · 0 comments
Open

Implement parse data from AsyncIterator #447

uasan opened this issue Dec 4, 2024 · 0 comments

Comments

@uasan
Copy link
Contributor

uasan commented Dec 4, 2024

Hello, it's me again )

The streams work well, but I decided to optimize them even more by removing all unnecessary abstractions, leaving only the native JS (AsyncIterator) and the result is worth it.

I propose to add a new simple interface for parsing data from an asynchronous iterator (generator), it is a great performance and a very simple implementation.

User land example:

import { parse } from 'csv-parse/iterator';

async function* iterator() {
  try {
    yield Buffer.from('A,B,C\n');
  } catch (error) {
    console.error(error);
  }
}

for await (const records of parse(iterator()))
  console.log(records);

Lib implement:

async function* parse(iterator) {
  let result = null;
  const setResult = records => {
    result = records;
  };

  for await (const chunk of iterator) {
    const error = api.parse(chunk, false, setResult);

    if (error) {
      await iterator.throw(error);
    } else if (result) {
      yield result;
      result = null;
    }
  }

  // Flush
  const error = api.parse(undefined, true, setResult);

  if (error) {
    await iterator.throw(error);
  } else if (result) {
    return result;
  }
}

Asynchronous iterators are great for this task, they work in all JavaScript environments, consume less memory and CPU compared to any stream implementation.

In fact, there are a lot of sources in the form of asynchronous iterators, all streams provide an interface for asynchronous iterators, here is an example of fetch:

import { parse } from 'csv-parse/iterator';

const response = await fetch('file.csv');

for await (const record of parse(response[Symbol.asyncIterator]())) {
   console.log(record);
}

According to my local measurements, async iterators are 20% faster than streams, and at the user level, writing a generator function is much faster than coding using stream interfaces.

If you like it, I can do a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant