Hi, I want to know what is the best way to keep the databases I use in different projects? I use a lot of CSVs that I need to prepare every time I'm working with them (I just copy paste the code from other projects) but would like to make some module that I can import and it have all the processes of the databases for example for this database I usually do columns = [(configuration of, my columns)], names = [names], dates = [list of columns dates], dtypes ={column: type},
then database_1 = pd.read_fwf(**kwargs), database_2 = pd.read_fwf(**kwargs), database_3 = pd.read_fwf(**kwargs)...
Then database = pd.concat([database_1...])
But I would like to have a module that I could import and have all my databases and configuration of ETL in it so I could just do something like 'database = my_module.dabase' to import the database, without all that process everytime.
Thanks for any help.
Yes, kinda.
They get updated by the accounting team each month. Some of them are csv, other come from an access database file, other from the sql server.
Some of the code need to be run each month with the updated databases, but there's a lot of ad hoc statistical studies that my boss ask for that use the same databases.
I guess yes. And not, the accountants keep the same filenames but change the directory lmao.
Thanks, im checking it out.
import sys sys.path.append('my\\modules\\directory) import my_module
You're right, I'm an actuarie. I wanted to do computer science instead of actuarial sciences, but I tough that it would be better getting an actuarial degree and then doing a masters on CS (still in planning, maybe 2026). I'm the only guy on the company who uses python and people here thinks I'm a genius because I have automated some boring things from excel.
If things are changing a bit each month, then in your module rather than a plain variable assignment
you might want a function that you can pass in parameters to represent the things that can change:
Then you can call it like this:
... gope that makes some sense.