feat: initial source implementation #2

grvsahil · 2024-12-13T08:07:03Z

Source implementation for SFTP connector.

Quick checks:

There is no other pull request for the same update/change.
I have written unit tests.
I have made sure that the PR is of reasonable size and can be easily reviewed.

hariso

Thanks for the good work!

source/source.go

source/iterator.go

hariso · 2025-01-03T18:18:16Z

source/iterator.go

+		metadata := iter.createMetadata(fileInfo, fullPath, len(chunk))
+		metadata["chunk_index"] = fmt.Sprintf("%d", chunkIndex)
+		metadata["total_chunks"] = fmt.Sprintf("%d", totalChunks)
+		metadata["hash"] = hash(fileInfo.modTime.Format(time.RFC3339))


Nitpick: so that a user doesn't think this is the chunk content hash, it would be good to change the name to something like file_mod_time_hash, or even just the time?

I've been thinking a bit about the hash. Every file with the same modification time will have the same hash, and that's quite possible. Using the file name, creation time, and/or last modification time together might be an alternative. Two files with the same path might not be the same file, because a file might have been created, then deleted, then created again with the same name.

Yes, now I have used filename, mod time and filesize to create the hash.

source/iterator.go

README.md

hariso · 2025-01-07T12:20:44Z

source/source.go

+	s.ch = make(chan opencdc.Record)
+	s.wg = &sync.WaitGroup{}
+
+	s.wg.Add(1)
+	err = NewIterator(ctx, s.sshClient, s.sftpClient, s.position, s.config, s.ch, s.wg)


Now that I'm reading this code again, why do we actually need a channel that's shared between the source and the iterator? Can the iterator return a record on demand, i.e. when the source's Read() method is called, it calls iterator.Next(), and gets the next record or chunk?

Yes, now I have refactored the iterator to return records on demand.

hariso · 2025-01-07T12:23:01Z

source/iterator.go

+	fullPath := filepath.Join(iter.config.DirectoryPath, filename)
+
+	// Get initial file stat.
+	initialStat, err := iter.sftpClient.Stat(fullPath)


IIUC, the source works this way: we get a list of file paths, we go through the list, and get the files.

If that's the case, then modifications to files can happen between the time when we get the list of files and when we fetch a file. A file can be moved/deleted, and in that case, I don't think we should fail.

Yes this could happen, handled it now.

source/source_integration_test.go

feat: initial source implementation

046ca3a

grvsahil self-assigned this Dec 17, 2024

seperated source logic into iterator

f08eefb

parikshitg linked an issue Dec 18, 2024 that may be closed by this pull request

Connector: SFTP [Source/Destination] ConduitIO/conduit#1589

Open

2 tasks

Gaurav Sahil added 5 commits December 18, 2024 19:26

added test cases

80d9c03

added integration test

1fd6eee

remove go generate directive from source

cbe9f1a

fix: test cases

e89ed03

fix: teardown

4c4458d

grvsahil marked this pull request as ready for review December 18, 2024 21:35

Gaurav Sahil added 6 commits December 19, 2024 11:33

added go header

65ddecc

added file chunk mechanism for files larger than 3 mb

b1e919d

added configurable chunk

154627f

fix: updated readme

47bfd60

fix: large file processing

3ee28ab

fix: refactored iterator

59ddc6d

hariso reviewed Jan 3, 2025

View reviewed changes

Gaurav Sahil added 3 commits January 6, 2025 22:37

fix: handle file modification while read

0860cb6

fix: conflicts

bb4b335

fix: source integration test

5c1e4b9

hariso reviewed Jan 6, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

modify README file

ab09b17

hariso reviewed Jan 7, 2025

View reviewed changes

source/source_integration_test.go Show resolved Hide resolved

fix: refactored iterator to provide record on demand

4c82f15

hariso approved these changes Jan 9, 2025

View reviewed changes

hariso mentioned this pull request Jan 9, 2025

Connector: SFTP [Source/Destination] ConduitIO/conduit#1589

Open

2 tasks

Gaurav Sahil added 2 commits January 10, 2025 16:56

added source and destination directories in docker compose

6c870ec

fix: test workflow

e374c2d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: initial source implementation #2

feat: initial source implementation #2

grvsahil commented Dec 13, 2024 •

edited

Loading

hariso left a comment

hariso Jan 3, 2025

hariso Jan 6, 2025

grvsahil Jan 6, 2025

hariso Jan 7, 2025

grvsahil Jan 9, 2025

hariso Jan 7, 2025

grvsahil Jan 9, 2025

feat: initial source implementation #2

Are you sure you want to change the base?

feat: initial source implementation #2

Conversation

grvsahil commented Dec 13, 2024 • edited Loading

Quick checks:

hariso left a comment

Choose a reason for hiding this comment

hariso Jan 3, 2025

Choose a reason for hiding this comment

hariso Jan 6, 2025

Choose a reason for hiding this comment

grvsahil Jan 6, 2025

Choose a reason for hiding this comment

hariso Jan 7, 2025

Choose a reason for hiding this comment

grvsahil Jan 9, 2025

Choose a reason for hiding this comment

hariso Jan 7, 2025

Choose a reason for hiding this comment

grvsahil Jan 9, 2025

Choose a reason for hiding this comment

grvsahil commented Dec 13, 2024 •

edited

Loading