FileProcessor

FileProcessor is an asynchronous and distributed application for processing CSV files. It's designed to be deployed across multiple machines for parallel processing, ensuring non-duplicate tasks and efficient data handling.

Project Highlights

Asynchronous and distributed architecture.
Fault-tolerant design with automatic process restarts.
Scalable to handle multiple CSV sources and processing tasks.
Move corrupted files to error directory.
Real-time exchange rate updates for accurate currency conversion.
Data integrity measures to prevent incomplete, duplicate, or corrupt data.

Key Features

1. CSV Fetching

Fetch CSV files from multiple API endpoints.
Dynamic addition of new CSV sources.
Scheduled daily downloads.
Flexible handling of varying CSV formats.
Local storage of downloaded files.
Retry mechanism for failed csv files.

2. CSV Processing

Processing of locally stored CSV files.
Real-time currency conversion using latest exchange rates.
Mapping of CSV columns to database schema.
Local storage for corrupted files.
Efficient bulk insertion and updates to the database.
Memory cleanup after processing.
Retry mechanism for failed csv files.

3. Exchange Rate Updates

Daily fetching and updating of currency exchange rates.
Storage of exchange rates in the database for quick access.
Retry mechanism for failed cases.

4 Distributed Processing

Use of Horde for distributed registry and supervision.
Libcluster for automatic clustering of nodes.
Even distribution of tasks across available nodes.

5 Fault Tolerance

Automatic restart of failed processes.
Handling of network failures and API downtime.
Handling of failed processed files.
Handling of corrupted files.

Setup Instructions

1. Clone the repository

  https://github.com/techitdeveloper/File_ProcessingDistributed_System-Elixir.git
  cd file_processor

2. Install Dependencies

  mix deps.get

3. Configure Database - Edit config/config.exs and update the database credentials:

  config :file_processor, FileProcessor.Repo,
  username: "your_username",
  password: "your_password",
  hostname: "localhost"

4. Create and Migrate Database

  mix ecto.create
  mix ecto.migrate

5.Local Development Setup

Serving CSV Files for Testing

To simulate a remote file server during development and testing, we use Python's built-in HTTP server. This allows us to serve CSV files locally, mimicking the behavior of fetching files from a remote server.

From a new terminal and navigate to the directory containing your test CSV files:

  cd csv_files_test

Start the Python HTTP server

  python3 -m http.server

Your CSV files in the csv_files_test directory are now accessible via http://localhost:8000/. For example, if you have a file named test_data.csv in this directory, it would be accessible at http://localhost:8000/test_data.csv.

6. Start the Application

  iex -S mix

Running in a Distributed Environment

1. Start first node

  iex --name node1@127.0.0.1 -S mix

2. Start another node

  iex --name node2@127.0.0.1 -S mix

Repeat this step for as many nodes as you want to add, changing the node name each time.

Adding CSV Sources

To add a new CSV source, use the FileProcessor.Api.add_csv_source/1 function:

  FileProcessor.Api.add_csv_source("http://localhost:8000/test1.csv")

Removing CSV Sources

To remove a CSV source, use the FileProcessor.Api.remove_csv_source/1 function:

  FileProcessor.Api.remove_csv_source("http://localhost:8000/test1.csv")

Listing CSV Sources

To list all current CSV sources, use the FileProcessor.Api.list_csv_sources function:

  FileProcessor.Api.list_csv_sources()

Monitoring

The application uses Elixir's Logger for comprehensive logging. Monitor the console output or configure a more advanced logging solution as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
config		config
csv_files_test		csv_files_test
lib		lib
priv/repo/migrations		priv/repo/migrations
test		test
.formatter.exs		.formatter.exs
.gitignore		.gitignore
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FileProcessor

Project Highlights

Key Features

1. CSV Fetching

2. CSV Processing

3. Exchange Rate Updates

4 Distributed Processing

5 Fault Tolerance

Setup Instructions

1. Clone the repository

2. Install Dependencies

3. Configure Database - Edit config/config.exs and update the database credentials:

4. Create and Migrate Database

5.Local Development Setup

Serving CSV Files for Testing

6. Start the Application

Running in a Distributed Environment

1. Start first node

2. Start another node

Adding CSV Sources

Removing CSV Sources

Listing CSV Sources

Monitoring

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FileProcessor

Project Highlights

Key Features

1. CSV Fetching

2. CSV Processing

3. Exchange Rate Updates

4 Distributed Processing

5 Fault Tolerance

Setup Instructions

1. Clone the repository

2. Install Dependencies

3. Configure Database - Edit config/config.exs and update the database credentials:

4. Create and Migrate Database

5.Local Development Setup

Serving CSV Files for Testing

6. Start the Application

Running in a Distributed Environment

1. Start first node

2. Start another node

Adding CSV Sources

Removing CSV Sources

Listing CSV Sources

Monitoring

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages