You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is a proposal to add SplitOnChunkSize() to FileInfo that would split a file into multiple files and return an array of the newly created files. The challenge with this one will be handling line breaks if the breakOnNewlines is true and also taking into account large files means buffering a chunk of data at a time so as not to overload system resources.
/// <summary>
/// Splits a file into multiple files based on the specified chunk size of each file.
/// </summary>
/// <param name="file">The file.</param>
/// <param name="chunkSize">The maximum number of bytes to store in each file.
/// If a chunk size is not provided, files will be split into 1 MB chunks by default.
/// The breakOnNewlines parameter can slightly affect the size of each file.</param>
/// <param name="targetPath">The destination where the split files will be saved.</param>
/// <param name="deleteAfterSplit">if set to <c>true</c>, the original file is deleted after creating the newly split files.</param>
/// <param name="breakOnNewlines">if set to <c>true</c> break the file on the next newline once the chunk size limit is reached.</param>
/// <returns>
/// An array of references to the split files.
/// </returns>
/// <exception cref="ArgumentNullException">file</exception>
/// <exception cref="ArgumentOutOfRangeException">chunkSize - The chunk size must be larger than 0 bytes.</exception>
public static FileInfo[] SplitOnChunkSize(
this FileInfo file,
int chunkSize = 1000000,
DirectoryInfo targetPath = null,
bool deleteAfterSplit = false,
bool breakOnNewlines = true
)
{
if (file == null)
throw new ArgumentNullException(nameof(file));
if (chunkSize < 1)
throw new ArgumentOutOfRangeException(nameof(chunkSize), chunkSize,
"The chunk size must be larger than 0 bytes.");
if (file.Length <= chunkSize)
return new[] {file};
var buffer = new byte[chunkSize];
var extraBuffer = new List<byte>();
targetPath = targetPath ?? file.Directory;
var chunkedFiles = new List<FileInfo>((int)Math.Abs(file.Length / chunkSize) + 1);
using (var input = file.OpenRead())
{
var index = 1;
while (input.Position < input.Length)
{
var chunkFileName = new FileInfo(Path.Combine(targetPath.FullName, $"{file.Name}.CHUNK_{index++}"));
chunkedFiles.Add(chunkFileName);
using (var output = chunkFileName.Create())
{
var chunkBytesRead = 0;
while (chunkBytesRead < chunkSize)
{
var bytesRead = input.Read(buffer,
chunkBytesRead,
chunkSize - chunkBytesRead);
if (bytesRead == 0)
{
break;
}
chunkBytesRead += bytesRead;
}
if (breakOnNewlines)
{
var extraByte = buffer[chunkSize - 1];
while (extraByte != '\n')
{
var flag = input.ReadByte();
if (flag == -1)
break;
extraByte = (byte)flag;
extraBuffer.Add(extraByte);
}
output.Write(buffer, 0, chunkBytesRead);
if (extraBuffer.Count > 0)
output.Write(extraBuffer.ToArray(), 0, extraBuffer.Count);
extraBuffer.Clear();
}
}
}
}
if (deleteAfterSplit)
file.Delete();
return chunkedFiles.ToArray();
}
The text was updated successfully, but these errors were encountered:
Maybe just calling it Split() instead of SplitOnChunkSize() would be ok too if we want to have overloaded methods in the future that would handle other scenarios like splitting on number of lines per file.
This issue is a proposal to add
SplitOnChunkSize()
to FileInfo that would split a file into multiple files and return an array of the newly created files. The challenge with this one will be handling line breaks if thebreakOnNewlines
is true and also taking into account large files means buffering a chunk of data at a time so as not to overload system resources.The text was updated successfully, but these errors were encountered: