FileProcessor is an asynchronous and distributed application for processing CSV files. It's designed to be deployed across multiple machines for parallel processing, ensuring non-duplicate tasks and efficient data handling.
- Asynchronous and distributed architecture.
- Fault-tolerant design with automatic process restarts.
- Scalable to handle multiple CSV sources and processing tasks.
- Move corrupted files to error directory.
- Real-time exchange rate updates for accurate currency conversion.
- Data integrity measures to prevent incomplete, duplicate, or corrupt data.
- Fetch CSV files from multiple API endpoints.
- Dynamic addition of new CSV sources.
- Scheduled daily downloads.
- Flexible handling of varying CSV formats.
- Local storage of downloaded files.
- Retry mechanism for failed csv files.
- Processing of locally stored CSV files.
- Real-time currency conversion using latest exchange rates.
- Mapping of CSV columns to database schema.
- Local storage for corrupted files.
- Efficient bulk insertion and updates to the database.
- Memory cleanup after processing.
- Retry mechanism for failed csv files.
- Daily fetching and updating of currency exchange rates.
- Storage of exchange rates in the database for quick access.
- Retry mechanism for failed cases.
- Use of Horde for distributed registry and supervision.
- Libcluster for automatic clustering of nodes.
- Even distribution of tasks across available nodes.
- Automatic restart of failed processes.
- Handling of network failures and API downtime.
- Handling of failed processed files.
- Handling of corrupted files.
https://github.com/techitdeveloper/File_ProcessingDistributed_System-Elixir.git
cd file_processor
mix deps.get
config :file_processor, FileProcessor.Repo,
username: "your_username",
password: "your_password",
hostname: "localhost"
mix ecto.create
mix ecto.migrate
To simulate a remote file server during development and testing, we use Python's built-in HTTP server. This allows us to serve CSV files locally, mimicking the behavior of fetching files from a remote server.
- From a new terminal and navigate to the directory containing your test CSV files:
cd csv_files_test
- Start the Python HTTP server
python3 -m http.server
- Your CSV files in the csv_files_test directory are now accessible via
http://localhost:8000/. For example, if you have a file named test_data.csv in this directory, it would be accessible athttp://localhost:8000/test_data.csv.
iex -S mix
iex --name node1@127.0.0.1 -S mix
iex --name node2@127.0.0.1 -S mix
Repeat this step for as many nodes as you want to add, changing the node name each time.
To add a new CSV source, use the FileProcessor.Api.add_csv_source/1 function:
FileProcessor.Api.add_csv_source("http://localhost:8000/test1.csv")
To remove a CSV source, use the FileProcessor.Api.remove_csv_source/1 function:
FileProcessor.Api.remove_csv_source("http://localhost:8000/test1.csv")
To list all current CSV sources, use the FileProcessor.Api.list_csv_sources function:
FileProcessor.Api.list_csv_sources()
The application uses Elixir's Logger for comprehensive logging. Monitor the console output or configure a more advanced logging solution as needed.