Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover un-synced data after reboot #344

Open
asteriskSF opened this issue Dec 3, 2019 · 14 comments
Open

Recover un-synced data after reboot #344

asteriskSF opened this issue Dec 3, 2019 · 14 comments

Comments

@asteriskSF
Copy link

I'm looking for some way to recover data that was written into flash by lfs but not yet sync'ed with the file after a reboot. Is this possible with some existing calls or is it possible to add some additional algorithm to implement as a new feature?

I am interested in doing this for a single file which is always appended with logging data. We hope the log data will better explain why the system rebooted in the first place.

We're using lfs v1.7.1 with a NOR-flash.

// block device configuration
.read_size = 256,
.prog_size = 256,
.block_size = 32768,
.block_count = 512,
.lookahead = 512,

static uint8_t read_buffer[256];
static uint8_t prog_buffer[256];
static uint32_t lookahead_buffer[512/32];

@e107steved
Copy link

e107steved commented Dec 4, 2019

@geky should be able to comment better on whether this would work

As a suggestion, set a flag in your low-level flash write routine at startup. Then write some data to your file. Logically this would attempt to write to the first free flash location, which is probably where your code started writing before the reset. Note the flash location accessed (and probably return an error, and clear the flag).
Depending on when the reset occurred, it's perfectly possible that the data never got to flash.

From what I can remember of how free space is identified, it may not work all the time

Alternatively, set a flag each time you start to write to your file, and in the low-level driver note the block number written in some area of RAM which isn't cleared (this entry may be trickier to identify, since there will be directory updates as well).

@geky
Copy link
Member

geky commented Dec 5, 2019

Not possible, lfs_file_sync is how you mark the data you want to persist. Otherwise as @e107steved mentions, it may be in RAM.

One option is to store a "commit offset", either as a separate file or as a custom attribute (both have similar costs). This would be a single integer offset that tracks the size of the log that is "committed". This would free up the size of the log so you could call lfs_file_sync more often to give you more data to debug.

Note that lfs_file_sync has a cost, so the best approach may be to call lfs_file_sync based on a timer. This way you can batch together spikes of multiple log writes.

@asteriskSF
Copy link
Author

Just so I can understand this better and explain it better to the rest of the team, Is it not possible because there is no persistent (non-volatile) record of the block which contains the unsychronized data which has been programmed on flash but not yet sync'ed into the file? Would it be possible to record in the metadata which blocks are storing the unsync'ed data?

I understand that data in RAM certainly would not be recoverable. We were hopefully that since only 256 bytes could be stored in the programming cache, that the remainder of the data (which must have already been written to the non-volatile storage) could be located and therefore recovered. In addition, we check the LFS internals to determine when the write has progressed into a new block and always sync after a write which which has moved into a new block. (This method minimizes the time delay of the sync by only needing to copy a minimal amount of data... we call it "lazy sync") Therefore we only have at most 1-2 blocks which have not been sync'ed.

Unfortunately with 32k blocks, quite a lot of our logging messages are lost when an unplanned reset occurs. We have chosen not to sync on a timer, because the block copy on the next file append can have a very high latency. Blocking our real-time system while LFS erases a block and copies up to 32k of data is very difficult to accommodate.

@asteriskSF
Copy link
Author

If we can store the new block location of the unwritten data after each sync, could we recover the data?

@joel-felcana
Copy link

I have the same question. We log continuously and at a high data rate. I noticed that after a power loss I get 0B available, no matter how many times had I written to flash (I mean real writes, calls to write_stuff_to_flash()) or blocks I had used. I tried lfs_file_sync() just before a reset and I could get the data back, which is terrific as it's exactly what I needed. I thought that after a write buffer was filled, a call to sync was made, and that sync was only needed to flush the buffer.

What are the drawbacks of calling sync often? Does it take one "slot" on the dir file on every sync? Would that cause sync times to steadily grow over time as it happened with open and close times in #214?

@geky
Copy link
Member

geky commented Dec 18, 2019

Ah interesting. So if I understand correctly, it's not the overhead of writing the metadata that's the problem, but actually that littlefs freezes the file's blocks, forcing the next write to have to copy the last block of the file.

That is a tricky problem to solve. You could imagine on NOR flash, where you can program a byte at a time, littlefs could simply continue to write data to the file without copying the block. So after a sync the file could continue to use the last block without copying. This would make a sync as cheap as a single metadata update.

Unfortunately, this doesn't work when the prog_size > 1. If you end up writing data that isn't aligned to the program block and call sync, littlefs must write out the full program block, including garbage padding. This would prevent you from being able to continue writing to the block.

We could modify littlefs to continue using blocks if they were synced and the data was aligned to a prog-size boundary. Then you could call sync every 256-bytes without as big disruptions.

I didn't originally because this is a rather niche optimization and I didn't think it would be worth the complexity.

Thinking generally, we could also repurpose the inline file mechanism used to store small files to also store data written to the end of files. This would let littlefs avoid freezing the tail of the file though it may add overhead to the metadata. This would be something I'd want to profile to see if it would be worth it. It would also be a bit complicated to implement.

@geky
Copy link
Member

geky commented Dec 18, 2019

What are the drawbacks of calling sync often? Does it take one "slot" on the dir file on every sync? Would that cause sync times to steadily grow over time as it happened with open and close times in #214?

Yes, this is also a concern. The runtime cost in #214 comes from the time it takes to scan the log, which grows as more commits are added.

Though one difference is that these syncs get cleaned up during metadata compaction, whereas the creation of additional files resides in the metadata block until the file is deleted.

I noticed that after a power loss I get 0B available

Ah, this is because lfs_file_open creates a 0B file. This isn't strictly necessary, but we need to store the file name somewhere. #353

We could have a temporary "uncommitted" file placeholder written to metadata, so the file doesn't exist after a power-loss, but I haven't seen this be a big request.

Other than that quirk, littlefs strictly does not update the on-disk file unless sync is called.

If we can store the new block location of the unwritten data after each sync, could we recover the data?

Yes, if you are willing to read directly from the raw block device, as long as you know the last block the data should still be there. The problem is that littlefs doesn't know which block was a part of the file unless you call sync.

You could store this in a custom attribute.

#define UNSYNC_BLOCK 0x75
lfs_setattr(&lfs, "path/to/file", UNSYNC_BLOCK, &file->block, sizeof(file->block));

Normally you can attach custom attributes to open files with the lfs_file_config struct, but that requires you to call sync, which kinda defeats the whole purpose.

@HamzaHajeir
Copy link

Well, It's a real problem, see this raised issue in ESP8266 Arduino core:
esp8266/Arduino#8155

Summary: the file is lost if didn't call File::close which is wrapped to lfs_file_close.
Reason: power interruption while writing.

That hits the guarantee that LittleFS provides:

Power-loss resilience - littlefs is designed to handle random power failures. All file operations have strong copy-on-write guarantees and if power is lost the filesystem will fall back to the last known good state.

@M-Bab
Copy link

M-Bab commented Aug 25, 2022

During extensive testing I recognized sometimes an fsync is not enough to have all data actually written. Because we are working with small blocks of data (64 bytes), I had to artificially flood the buffer when I really want to have all data written:

void LOG_QuickSync(bool bFloodBuffer)
{
  if (logFileWriteHandle != NULL)
  {
    if (bFloodBuffer)
    {
      /* Fill Cache with enough data that definitely all events are written. */
      static const uint8_t DataBufferFiller[CONFIG_LITTLEFS_WRITE_SIZE] = {0};
      fwrite((void *) DataBufferFiller, sizeof(uint8_t), sizeof(DataBufferFiller), logFileWriteHandle);
      ESP_LOGI(TAG, "Forced quick sync for up-to-date log done!");
    }
    fsync(fileno(logFileWriteHandle));
  }
}

All tests with different littlefs configurations (changed PAGE, READ, WRITE, LOOKAHEAD and CACHE size) did not help. So I am posting this here that it might resolve the problem for others - or maybe someone comes up with a solution that the workaround is not needed.

@trullock
Copy link

I've just been doing some testing on this, and I have this exact behaviour (i.e. 0B file after a power cut before file.close()) on ESP32 using Arduino, but not on ESP8266 using Arduino. With the latter is seems to have synced on every flush(). Cna anyone shed any light on this before I spend hours diffing the sources and configs to try and find the difference?

@geky
Copy link
Member

geky commented Sep 21, 2023

Hi @trullock, I'm not sure about the above issue, but I just wanted to mention littlefs currently creates a zero-length file when you call lfs_file_open in order to save the filename. That may be what you're seeing.

@trullock
Copy link

@geky thanks, probably is but on the 8266 I get all written bytes until the reset and on the 32 I get none

@geky
Copy link
Member

geky commented Sep 21, 2023

Huh, sounds like one library is just calling lfs_file_sync at different times than the other. Maybe they have different flush implementations.

@trullock
Copy link

Yeah see here for some more clues joltwallet/esp_littlefs#144

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants