I'm looking forward to the update...
Printable View
One more for tonight..
Ozzy Osbourne - Believer (Source Track 96.0 kHz Sample Rate FLAC) https://www.mediafire.com/file/cwc2w.../Believer.flac
Ozzy Osbourne - Believer (Instrumental) https://www.mediafire.com/file/lv2o0...strumental.mp3
Ozzy Osbourne - Believer (Acapella) https://www.mediafire.com/file/v06jh...ever_Vocal.mp3
i downloaded the new baseline.....how do u get it to activate or batch process or using a genre process or does it recognize what type of music it is....im lost lol
tried to train but got this error at the end
1 +- 03_bill_mix.mp3 +- 03_bill_inst.mp3
2 +- 04_fasc_mix.mp3 +- 04_fasc_inst.mp3
3 +- 01_amd_mix.mp3 +- 01_amd_inst.mp3
4 +- 02_beat_mix.mp3 +- 02_beat_inst.mp3
0%| | 0/4 [00:00<?, ?it/s]C:\Users\Robert\AppData\Local\Programs\Python\Pyth on37\lib\site-packages\librosa\core\audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn('PySoundFile failed. Trying audioread instead.')
C:\Users\Robert\AppData\Local\Programs\Python\Pyth on37\lib\site-packages\librosa\core\audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn('PySoundFile failed. Trying audioread instead.')
100%|█████████████████████████████████████████████ ███████████████████████████████████████| 4/4 [01:53<00:00, 28.25s/it]
0it [00:00, ?it/s]
# epoch 0
* inner epoch 0
Traceback (most recent call last):
File "train.py", line 223, in <module>
main()
File "train.py", line 194, in main
X_train, y_train, model, optimizer, args.batchsize, instance_loss)
File "train.py", line 75, in train_inner_epoch
return sum_loss / len(X_train)
ZeroDivisionError: division by zero
This error is due to your training set being too small. You need a bare minimum of 15 pairs in order to start training. Also, if you're training from scratch like this you'll need at LEAST 50-75 pairs for it to be effective at all. Your training/validation numbers won't move with sets any lower than 50; You'll end up wasting your system resources and being sorely disappointed with your models' performance.
If you choose to train with a set between 15-50 pairs, just finetune one of the baseline models (commands in the main thread). I figured out how to train effectively with a GPU, so train with your GPU if you have one.
A new model has been posted to the main page! Please make sure to use it with the new A.I. provided as it won't work with the old one.
Hey Anjok! First of all thank you for this awesome AI, it works really well and does a great job separating the tracks.
But now I have a problem with the new model uploaded.
When I tried to run using GPU I get the following error:
This didn't happened with the old version. There's a way to solve this? Because using CPU is reaaaally slow. Thank you!Quote:
Traceback (most recent call last):
File "inference.py", line 104, in <module>
main()
File "inference.py", line 64, in main
pred = model.predict(X_window)
File "C:\Users\KennA\Documents\vocal-removerV2\lib\nets.py", line 79, in predict
h = self.full_band_net(self.bridge(h))
File "C:\Users\KennA\Documents\vocal-removerV2\lib\nets.py", line 34, in __call__
h = self.dec1(h, e1)
File "C:\Users\KennA\Documents\vocal-removerV2\lib\layers.py", line 79, in __call__
x = spec_utils.crop_center(x, skip)
File "C:\Users\KennA\Documents\vocal-removerV2\lib\spec_utils.py", line 20, in crop_center
return torch.cat([h1, h2], dim=1)
RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 2.00 GiB total capacity; 948.49 MiB already allocated; 308.74 MiB free; 137.51 MiB cached)
GPU not much cop mate.....
so how does it work after you have trained it
does it recognize wether its a rock song etc
I made it to the conversion step and then got an error I can't figure out;
C:\Users\xxxx\Documents\vocal-remover>python inference.py --input Daredevil.mp3 --gpu 0
C:\Users\xxxx\AppData\Local\Programs\Python\Python 37\lib\site-packages\librosa\util\decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
from numba.decorators import jit as optional_jit
C:\Users\xxxx\AppData\Local\Programs\Python\Python 37\lib\site-packages\librosa\util\decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
from numba.decorators import jit as optional_jit
loading model... done
C:\Users\xxxx\AppData\Local\Programs\Python\Python 37\lib\site-packages\librosa\core\audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn('PySoundFile failed. Trying audioread instead.')
loading wave source... Traceback (most recent call last):
File "C:\Users\xxxx\AppData\Local\Programs\Python\Pytho n37\lib\site-packages\librosa\core\audio.py", line 129, in load
with sf.SoundFile(path) as sf_desc:
File "C:\Users\xxxx\AppData\Local\Programs\Python\Pytho n37\lib\site-packages\soundfile.py", line 629, in __init__
self._file = self._open(file, mode_int, closefd)
File "C:\Users\xxxx\AppData\Local\Programs\Python\Pytho n37\lib\site-packages\soundfile.py", line 1184, in _open
"Error opening {0!r}: ".format(self.name))
File "C:\Users\xxxx\AppData\Local\Programs\Python\Pytho n37\lib\site-packages\soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'Daredevil.mp3': File contains data in an unknown format.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "inference.py", line 104, in <module>
main()
File "inference.py", line 39, in main
args.input, args.sr, False, dtype=np.float32, res_type='kaiser_fast')
File "C:\Users\xxxx\AppData\Local\Programs\Python\Pytho n37\lib\site-packages\librosa\core\audio.py", line 162, in load
y, sr_native = __audioread_load(path, offset, duration, dtype)
File "C:\Users\xxxx\AppData\Local\Programs\Python\Pytho n37\lib\site-packages\librosa\core\audio.py", line 186, in __audioread_load
with audioread.audio_open(path) as input_file:
File "C:\Users\xxxx\AppData\Local\Programs\Python\Pytho n37\lib\site-packages\audioread\__init__.py", line 116, in audio_open
raise NoBackendError()
audioread.exceptions.NoBackendError
Yep, that fixed it! Thanks :) I also needed to use wav files (mp3's won't work, at least not on my system).
I threw a couple of the toughest conversions that I know of at it to see. On the whole, the primary AI over in the other topic is superior, at least presently, in that this one has more trace vocal across the tracks, like a lingering echo instead of the static the other one gives ... but both are in the same ballpark, which is way ahead of any other program.
One fascinating thing though, is that this one actually seems to handle some things BETTER than the other.. though I'd need to do more testing to see. I'd say this is the exception, not the rule ... but... For example, Steven Wilson - Blackest Eyes it does a poorer job of the verse sections, but does a superior job on the bridge. On Smashing Pumpkins - JellyBelly it does a poorer job on the overall vocals, since there is trace bleed here, and the other AI eliminates it entirely, ... but on certain parts of the song, the other AI completely fails to remove any vocals at all, and this one does not do a perfect job by any stretch, but it does a noteworthy better job on those parts.
I've had very little time to test, and I know that there are more builds to come (which I look forward to), just very early observations. Genre specific models fascinate me as well ... what if that one part in a Rock song converts better with a Pop oriented model and can be sliced in with the rest of the song converted with the Rock model to create a complete product? I'm already getting that vibe just comparing this model with the other one. Even failing that, more diverse coverage of quality results is likely.
Got the new AI up and running and decided on Gerry Rafferty's Baker Street and what a awesome job it did. Some instruments ended up on the vocal track but I just put them back in the instrumental track. This conversion took three hours for completion..
Instrumental
https://www.mediafire.com/file/qtla5...nstruments.mp3
i eventually got it working but it must hog everything on laptop.........chucked a halestorm song at it .........15 and a half hours.........no chance.......so threw it into my sons gaming desktop.........16 seconds later the song was done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input SLFNEW.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|█████████████████████████████████████████████ █████████████████████████████████████| 43/43 [00:18<00:00, 2.30it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input SMITH.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|█████████████████████████████████████████████ █████████████████████████████████████| 69/69 [00:28<00:00, 2.39it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input ZZYZX.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|█████████████████████████████████████████████ █████████████████████████████████████| 54/54 [00:22<00:00, 2.35it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input HILL.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|█████████████████████████████████████████████ █████████████████████████████████████| 50/50 [00:23<00:00, 2.11it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input CHAOS.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|█████████████████████████████████████████████ █████████████████████████████████████| 40/40 [00:18<00:00, 2.20it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input JAD.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|█████████████████████████████████████████████ █████████████████████████████████████| 39/39 [00:17<00:00, 2.29it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2>
Oh god my head is spinning...
Personally I'd like to see a GUI with anything current and then a way to simply add the updates later by either certain file types or a simple update command tied to the git-hub.
compared to other programs and based against the earlier version..........v2 is smoking
only draw back is you defo need a top notch PC to get things done fast......
if this is based on 300 odd pairs...........the 1000 pair edition that Anjok is maybe gonna release is gonna be immense
think i maybe need to rob a bank for a new pc
This is actually something I'm testing now! I had to give my PC a break from training for a bit because I didn't want to burn it out.. I'm almost done building a new one that going to be 100x more powerful. Once I get my remaining parts in the mail, I'm going to start training aggressively with new settings and different batch sizes.
I'm going to attempt to train a 1000 pair model with the same amount of layers so you guys won't have to worry about PC drawbacks. With that being said, if the performance doesn't improve (or improves only marginally) I may have to add more layers to the model which will make it less friendly to older/slower PC's.
I'll do my best though!
I'll still be working on it concurrently with the models. However, I won't be dedicating as much time to it until I get the models to perform their best. Once I have something decent mocked up, I'll throw it on Git-hub if we have any coders here who want to contribute!
hahaha, I say bring on the beast xD Actually is kinda exciting that you are upgrading your PC.
Mileage may vary on this but I have had great success under-clocking my GPU to manage heat. ...
Not that I absolutely need to since I'm running a GTX 1070, but I've always got MSI Afterburner running.
For example, one of the previous models had me at 80-85C,
so I under-clocked from 1700MHz to 1200MHz Core and from 4,000 to 3,500MHz memory.
Presto, cool and comfy 68-72C.
I do the same when I'm gaming. I can still run Witcher 3 on my laptop at 60FPS under-clocked and hold a steady 70C, rather than the 85C it would otherwise run the game at.
My current PC is pretty tough and was definitely getting the job done, as I managed to keep it under 98F CPU and my GPU under 86F during some intense training sessions. The issue is it's 5 years old and my MB is capped on the amount of RAM I can install (32GB). I also can't upgrade my CPU or GPU to what I really need. I'm pretty sure if I kept training on it, the lifespan was going to decrease a lot quicker than if I had hardware optimized for deep learning.
My new set up will have 64GB's of RAM (expandable to 128GB), a RTX 2080Ti with 11GB's of V-RAM, and a far stronger processor. Definitely going to be spitting out more models in a shorter time frame, and save time determining the quality of my datasets. It takes a least 15-20 epoch runs to know if my dataset is effective. My new PC will get there in maybe 3 or 4 hours with a dataset consisting of 320 pairs. With my current PC using my GTX 1060 GPU, it takes a little over 24 hours to get to 15-20 epochs with a 320 pair dataset...
I'd rather use my current PC to code the GUI, analyze models, and run test conversions, which take 15 seconds per track. All while my new one trains.
I'm pretty excited about this too!!
I have made a fascinating discovery!
This AI converts early Death tracks SUBSTANTIALLY better than the primary AI in the other topic. It's no contest, it just crushes it in every way imaginable. Take a look at Zombie Ritual from Scream Bloody Gore. Convert it with each and see side by side if you like.
This does appear to be the exception rather than the rule, but there are certainly places where this AI (or this current model of this AI) does a superior job.
Another great example to look at is The Cure - One Hundred Years
The other AI does a slightly better job overall at stripping the vocals, but this AI model does not pull the drum echo into the vocal track, making it superior in that regard. I also found that Black Sabbath - Symptom of the Universe is stronger up front (though the back half of the song is a mess) ... another heavy reverb song ...
Scream Bloody Gore is also a reverb heavy album...
I don't want to jump to conclusions because it may be more complicated than reverb, but it certainly sounds like this model is far superior at handling reverb over the other one. I don't know for sure, but identifying why in particular, Scream Bloody Gore is such a crushing difference ... would be interesting.
Attachment 642
Here is a mock-up of the GUI I'm working on for this AI. I'm still working on coding and configuring it. More features and options will be added! This is my first draft of the application, so the final product will look better than this.
I'll also be releasing a GUI for training too, but that will be way after the conversion GUI is released.
That's really good to know! I've noticed this model is far better at removing reverbs as well. Funny thing is this dataset actually included a lot of pop and dance music (courtesy of some users here who sent me some great pairs). It makes me even more excited to get the rest of my PC parts next week!
I can not get this work on Windows 10 64bit or Windows 8.1 64bit and would like to know what the possible issues are. I've followed the instructions and still get this error.......
'pip' is not recognized as an internal or external command,
operable program or batch file.
Can you please list instructions for how to use this on Linux/Ubuntu?
Thanks.
This should fix the "pip" issue. https://www.youtube.com/watch?v=zYdHr-LxsJ0
Not sure what this is about but after trying to convert a song it gives this and then just sits on a blinking cursor.
C:\Users\zensh\Documents\vocal-removerV2>python inference.py --input Made Of Stone.mp3
C:\Program Files\Python37\lib\site-packages\librosa\util\decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
from numba.decorators import jit as optional_jit
C:\Program Files\Python37\lib\site-packages\librosa\util\decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
from numba.decorators import jit as optional_jit
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
And then I try running the pip install command again just to be sure and displays I'm still missing files.
C:\Users\zensh\Documents\vocal-removerV2>pip install torch==1.3.0 torchvision==0.4.0 -f https://download.pytorch.org/whl/torch_stable.html
Defaulting to user installation because normal site-packages is not writeable
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Requirement already satisfied: torch==1.3.0 in c:\program files\python37\lib\site-packages (1.3.0+cu92)
Requirement already satisfied: torchvision==0.4.0 in c:\program files\python37\lib\site-packages (0.4.0+cu92)
Requirement already satisfied: numpy in c:\program files\python37\lib\site-packages (from torch==1.3.0) (1.18.4)
Requirement already satisfied: pillow>=4.1.1 in c:\program files\python37\lib\site-packages (from torchvision==0.4.0) (7.1.2)
Requirement already satisfied: six in c:\program files\python37\lib\site-packages (from torchvision==0.4.0) (1.14.0)
Could not build wheels for torch, since package 'wheel' is not installed.
Could not build wheels for torchvision, since package 'wheel' is not installed.
Could not build wheels for numpy, since package 'wheel' is not installed.
Could not build wheels for pillow, since package 'wheel' is not installed.
Could not build wheels for six, since package 'wheel' is not installed.
C:\Users\zensh\Documents\vocal-removerV2>
Finally got it going. Takes hours to convert. Hopefully I can get a more powerful PC soon.