I've to do a work: a recital with 3 cams and 1 audio source. I want to create a live-streaming in real-time managed by an app nested into a website. I'm asking if Java is good for my purposes. I ask your opinion about this...

This is the situation:
1 stereo mic linked to a mac
3 ip-cams linked to a modem
3 dv-cams
B)I need
a little free-app integrated into my website that gives to everybody in internet the possibility to change in real time the video sources from the 3 I offer, but always listening to the audio source without stop it... like a surveillance software or like a virtual direction cabin (also having 60 seconds buffer! it doesn't matter. The main thing must be the synchro between the 3 videos and audio signal).

I think I've three option:
3 ip video stream from the 3 ip-cams through the modem to the web
1 audio signal through the mac to the web
1 app nested into my website used by people to see the performance, and that gives the possibility to change the camera-view (from the 3 I offer) without stop the audio signal...all in sync, like a surveillance software or better like a virtual director cabin
3 dv-cams each multiplexed in real time with the unique audio signal from the mic in sync (mic and dv-cams are linked to the mac)
and then outputting these 3 multiplexed video signals into 3 separated ip audio/video streamings
1 app nested into my website used by people to see the performance in the web, that gives the possibility to change the camera-view (from the 3 multiplexed signals I offer as live-streamings) like a virtual director cabin
3 dv-cam into the mac
1 mic into the same mac
A software that multiplexes audio with a unique glued-3-video-sectors (a sort of double-splitscreen built with the 3 video ins):
[1]+[2]+[3] = [1|2|3]
This software must transform the obtained audio+superWS-video into a unique live-streaming into the web
A simple app (nested into my website) that "sees" this obtained stream, and works like a double-layer-masking software: we have 2 layers (top-layer is the mask and bottom-layer is the live-streaming), so moving left or right the bottom-streaming-layer, this app hides 2 sectors of the whole video under the opaque part of the top-mask-layer, so only 1 sector will be shown
[-|2|-] -> [1|-|-] <- [-|2|-] <- [-|-|3]
this will simulate a simple mixer video with synchronized audio.

This is the question:
which is the better option, and what kind of software do I need to do this job? (converting the ip streams, building the app that mixes ip video, nesting the app into my website)
Can Java help me, or something else?